Hypothesis

Postnatal environmental exposures, particularly those found in household products and dietary intake, along with specific serum metabolomics profiles, are significantly associated with the BMI Z-score of children aged 6-11 years. Higher concentrations of certain metabolites in serum, reflecting exposure to chemical classes or metals, will correlate with variations in BMI Z-score, controlling for age and other relevant covariates. Some metabolites associated with chemical exposures and dietary patterns can serve as biomarkers for the risk of developing obesity.

Background

Research indicates that postnatal exposure to endocrine-disrupting chemicals (EDCs) such as phthalates, bisphenol A (BPA), and polychlorinated biphenyls (PCBs) can significantly influence body weight and metabolic health (Junge et al., 2018). These chemicals, commonly found in household products and absorbed through dietary intake, are linked to detrimental effects on body weight and metabolic health in children. This hormonal interference can lead to an increased body mass index (BMI) in children, suggesting a potential pathway through which exposure to these chemicals contributes to the development of obesity.

A longitudinal study on Japanese children examined the impact of postnatal exposure (first two years of life) to p,p’-dichlorodiphenyltrichloroethane (p,p’-DDT) and p,p’-dichlorodiphenyldichloroethylene (p,p’-DDE) through breastfeeding (Plouffe et al., 2020). The findings revealed that higher levels of these chemicals in breast milk were associated with increased BMI at 42 months of age. DDT and DDE may interfere with hormonal pathways related to growth and development. These chemicals can mimic or disrupt hormones that regulate metabolism and fat accumulation. This study highlights the importance of understanding how persistent organic pollutants can affect early childhood growth and development.

The study by Harley et al. (2013) investigates the association between prenatal and postnatal Bisphenol A (BPA) exposure and various body composition metrics in children aged 9 years from the CHAMACOS cohort. The study found that higher prenatal BPA exposure was linked to a decrease in BMI and body fat percentages in girls but not boys, suggesting sex-specific effects. Conversely, BPA levels measured at age 9 were positively associated with increased adiposity in both genders, highlighting the different impacts of exposure timing on childhood development.

The 2022 study 2022 study by Uldbjerg et al. explored the effects of combined exposures to multiple EDCs, suggesting that mixtures of these chemicals can have additive or synergistic effects on BMI and obesity risk. Humans are typically exposed to a mixture of chemicals rather than individual EDCs, making it crucial to understand how these mixtures might interact. The research highlighted that the interaction between different EDCs can lead to additive (where the effects simply add up) or even synergistic (where the combined effect is greater than the sum of their separate effects) outcomes. These interactions can significantly amplify the risk factors associated with obesity and metabolic disorders in children. The dose-response relationship found that even low-level exposure to multiple EDCs could result in significant health impacts due to their combined effects.

These studies collectively illustrate the critical role of environmental EDCs in shaping metabolic health outcomes in children, highlighting the necessity for ongoing research and policy intervention to mitigate these risks.

Data Description

This study will utilize data from the subcohort of 1301 mother-child pairs in the HELIX study, who are which aged 6-11 years for whom complete exposure and outcome data were available. Exposure data included detailed dietary records after pregnancy and concentrations of various chemicals like BPA and PCBs in child blood samples. There are categorical and numerical variables, which will include both demographic details and biochemical measurements. This dataset allows for robust statistical analysis to identify potential associations between EDC exposure and changes in BMI Z-scores, considering confounding factors such as age, gender, and socioeconomic status. There are no missing data so there is not need to impute the information. Child BMI Z-scores were calculated based on WHO growth standards.

load("/Users/allison/Library/CloudStorage/GoogleDrive-aflouie@usc.edu/My Drive/HELIX_data/HELIX.RData")
filtered_chem_diet <- codebook %>%
  filter(domain %in% c("Chemicals", "Lifestyles") & period == "Postnatal" & subfamily != "Allergens")

# specific covariates
filtered_covariates <- codebook %>%
  filter(domain == "Covariates" & 
         variable_name %in% c("ID", "e3_sex_None", "e3_yearbir_None", "h_cohort", "hs_child_age_None"))

#specific phenotype variables
filtered_phenotype <- codebook %>%
  filter(domain == "Phenotype" & 
         variable_name %in% c("hs_zbmi_who"))

# combining all necessary variables together
combined_codebook <- bind_rows(filtered_chem_diet, filtered_covariates, filtered_phenotype)
kable(combined_codebook, align = "c", format = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
variable_name domain family subfamily period location period_postnatal description var_type transformation labels labelsshort
h_bfdur_Ter h_bfdur_Ter Lifestyles Lifestyle Diet Postnatal NA NA Breastfeeding duration (weeks) factor Tertiles Breastfeeding Breastfeeding
hs_bakery_prod_Ter hs_bakery_prod_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: bakery products (hs_cookies + hs_pastries) factor Tertiles Bakery prod BakeProd
hs_beverages_Ter hs_beverages_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: beverages (hs_dietsoda+hs_soda) factor Tertiles Soda Soda
hs_break_cer_Ter hs_break_cer_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: breakfast cereal (hs_sugarcer+hs_othcer) factor Tertiles BF cereals BFcereals
hs_caff_drink_Ter hs_caff_drink_Ter Lifestyles Lifestyle Diet Postnatal NA NA Drinks a caffeinated or æenergy drink (eg coca-cola, diet-coke, redbull) factor Tertiles Caffeine Caffeine
hs_dairy_Ter hs_dairy_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: dairy (hs_cheese + hs_milk + hs_yogurt+ hs_probiotic+ hs_desert) factor Tertiles Dairy Dairy
hs_fastfood_Ter hs_fastfood_Ter Lifestyles Lifestyle Diet Postnatal NA NA Visits a fast food restaurant/take away factor Tertiles Fastfood Fastfood
hs_KIDMED_None hs_KIDMED_None Lifestyles Lifestyle Diet Postnatal NA NA Sum of KIDMED indices, without index9 numeric None KIDMED KIDMED
hs_mvpa_prd_alt_None hs_mvpa_prd_alt_None Lifestyles Lifestyle Physical activity Postnatal NA NA Clean & Over-reporting of Moderate-to-Vigorous Physical Activity (min/day) numeric None PA PA
hs_org_food_Ter hs_org_food_Ter Lifestyles Lifestyle Diet Postnatal NA NA Eats organic food factor Tertiles Organicfood Organicfood
hs_proc_meat_Ter hs_proc_meat_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: processed meat (hs_coldmeat+hs_ham) factor Tertiles Processed meat ProcMeat
hs_readymade_Ter hs_readymade_Ter Lifestyles Lifestyle Diet Postnatal NA NA Eats a æready-made supermarket meal factor Tertiles Ready made food ReadyFood
hs_sd_wk_None hs_sd_wk_None Lifestyles Lifestyle Physical activity Postnatal NA NA sedentary behaviour (min/day) numeric None Sedentary Sedentary
hs_total_bread_Ter hs_total_bread_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: bread (hs_darkbread+hs_whbread) factor Tertiles Bread Bread
hs_total_cereal_Ter hs_total_cereal_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: cereal (hs_darkbread + hs_whbread + hs_rice_pasta + hs_sugarcer + hs_othcer + hs_rusks) factor Tertiles Cereals Cereals
hs_total_fish_Ter hs_total_fish_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: fish and seafood (hs_canfish+hs_oilyfish+hs_whfish+hs_seafood) factor Tertiles Fish Fish
hs_total_fruits_Ter hs_total_fruits_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: fruits (hs_canfruit+hs_dryfruit+hs_freshjuice+hs_fruits) factor Tertiles Fruits Fruits
hs_total_lipids_Ter hs_total_lipids_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: Added fat factor Tertiles Diet fat Diet fat
hs_total_meat_Ter hs_total_meat_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: meat (hs_coldmeat+hs_ham+hs_poultry+hs_redmeat) factor Tertiles Meat Meat
hs_total_potatoes_Ter hs_total_potatoes_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: potatoes (hs_frenchfries+hs_potatoes) factor Tertiles Potatoes Potatoes
hs_total_sweets_Ter hs_total_sweets_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: sweets (hs_choco + hs_sweets + hs_sugar) factor Tertiles Sweets Sweets
hs_total_veg_Ter hs_total_veg_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: vegetables (hs_cookveg+hs_rawveg) factor Tertiles Vegetables Vegetables
hs_total_yog_Ter hs_total_yog_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: yogurt (hs_yogurt+hs_probiotic) factor Tertiles Yogurt Yogurt
hs_dif_hours_total_None hs_dif_hours_total_None Lifestyles Lifestyle Sleep Postnatal NA NA Total hours of sleep (mean weekdays and night) numeric None Sleep Sleep
hs_as_c_Log2 hs_as_c_Log2 Chemicals Metals As Postnatal NA NA Arsenic (As) in child numeric Logarithm base 2 As As
hs_cd_c_Log2 hs_cd_c_Log2 Chemicals Metals Cd Postnatal NA NA Cadmium (Cd) in child numeric Logarithm base 2 Cd Cd
hs_co_c_Log2 hs_co_c_Log2 Chemicals Metals Co Postnatal NA NA Cobalt (Co) in child numeric Logarithm base 2 Co Co
hs_cs_c_Log2 hs_cs_c_Log2 Chemicals Metals Cs Postnatal NA NA Caesium (Cs) in child numeric Logarithm base 2 Cs Cs
hs_cu_c_Log2 hs_cu_c_Log2 Chemicals Metals Cu Postnatal NA NA Copper (Cu) in child numeric Logarithm base 2 Cu Cu
hs_hg_c_Log2 hs_hg_c_Log2 Chemicals Metals Hg Postnatal NA NA Mercury (Hg) in child numeric Logarithm base 2 Hg Hg
hs_mn_c_Log2 hs_mn_c_Log2 Chemicals Metals Mn Postnatal NA NA Manganese (Mn) in child numeric Logarithm base 2 Mn Mn
hs_mo_c_Log2 hs_mo_c_Log2 Chemicals Metals Mo Postnatal NA NA Molybdenum (Mo) in child numeric Logarithm base 2 Mo Mo
hs_pb_c_Log2 hs_pb_c_Log2 Chemicals Metals Pb Postnatal NA NA Lead (Pb) in child numeric Logarithm base 2 Pb Pb
hs_tl_cdich_None hs_tl_cdich_None Chemicals Metals Tl Postnatal NA NA Dichotomous variable of thallium (Tl) in child factor None Tl Tl
hs_dde_cadj_Log2 hs_dde_cadj_Log2 Chemicals Organochlorines DDE Postnatal NA NA Dichlorodiphenyldichloroethylene (DDE) in child adjusted for lipids numeric Logarithm base 2 DDE DDE
hs_ddt_cadj_Log2 hs_ddt_cadj_Log2 Chemicals Organochlorines DDT Postnatal NA NA Dichlorodiphenyltrichloroethane (DDT) in child adjusted for lipids numeric Logarithm base 2 DDT DDT
hs_hcb_cadj_Log2 hs_hcb_cadj_Log2 Chemicals Organochlorines HCB Postnatal NA NA Hexachlorobenzene (HCB) in child adjusted for lipids numeric Logarithm base 2 HCB HCB
hs_pcb118_cadj_Log2 hs_pcb118_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl -118 (PCB-118) in child adjusted for lipids numeric Logarithm base 2 PCB 118 PCB118
hs_pcb138_cadj_Log2 hs_pcb138_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl-138 (PCB-138) in child adjusted for lipids numeric Logarithm base 2 PCB 138 PCB138
hs_pcb153_cadj_Log2 hs_pcb153_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl-153 (PCB-153) in child adjusted for lipids numeric Logarithm base 2 PCB 153 PCB153
hs_pcb170_cadj_Log2 hs_pcb170_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl-170 (PCB-170) in child adjusted for lipids numeric Logarithm base 2 PCB 170 PCB170
hs_pcb180_cadj_Log2 hs_pcb180_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl-180 (PCB-180) in child adjusted for lipids numeric Logarithm base 2 PCB 180 PCB180
hs_sumPCBs5_cadj_Log2 hs_sumPCBs5_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Sum of PCBs in child adjusted for lipids (4 cohorts) numeric Logarithm base 2 PCBs SumPCB
hs_dep_cadj_Log2 hs_dep_cadj_Log2 Chemicals Organophosphate pesticides DEP Postnatal NA NA Diethyl phosphate (DEP) in child adjusted for creatinine numeric Logarithm base 2 DEP DEP
hs_detp_cadj_Log2 hs_detp_cadj_Log2 Chemicals Organophosphate pesticides DETP Postnatal NA NA Diethyl thiophosphate (DETP) in child adjusted for creatinine numeric Logarithm base 2 DETP DETP
hs_dmdtp_cdich_None hs_dmdtp_cdich_None Chemicals Organophosphate pesticides DMDTP Postnatal NA NA Dichotomous variable of dimethyl dithiophosphate (DMDTP) in child factor None DMDTP DMDTP
hs_dmp_cadj_Log2 hs_dmp_cadj_Log2 Chemicals Organophosphate pesticides DMP Postnatal NA NA Dimethyl phosphate (DMP) in child adjusted for creatinine numeric Logarithm base 2 DMP DMP
hs_dmtp_cadj_Log2 hs_dmtp_cadj_Log2 Chemicals Organophosphate pesticides DMTP Postnatal NA NA Dimethyl thiophosphate (DMTP) in child adjusted for creatinine numeric Logarithm base 2 DMDTP DMTP
hs_pbde153_cadj_Log2 hs_pbde153_cadj_Log2 Chemicals Polybrominated diphenyl ethers (PBDE) PBDE153 Postnatal NA NA Polybrominated diphenyl ether-153 (PBDE-153) in child adjusted for lipids numeric Logarithm base 2 PBDE 153 PBDE153
hs_pbde47_cadj_Log2 hs_pbde47_cadj_Log2 Chemicals Polybrominated diphenyl ethers (PBDE) PBDE47 Postnatal NA NA Polybrominated diphenyl ether-47 (PBDE-47) in child adjusted for lipids numeric Logarithm base 2 PBDE 47 PBDE47
hs_pfhxs_c_Log2 hs_pfhxs_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFHXS Postnatal NA NA Perfluorohexane sulfonate (PFHXS) in child numeric Logarithm base 2 PFHXS PFHXS
hs_pfna_c_Log2 hs_pfna_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFNA Postnatal NA NA Perfluorononanoate (PFNA) in child numeric Logarithm base 2 PFNA PFNA
hs_pfoa_c_Log2 hs_pfoa_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFOA Postnatal NA NA Perfluorooctanoate (PFOA) in child numeric Logarithm base 2 PFOA PFOA
hs_pfos_c_Log2 hs_pfos_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFOS Postnatal NA NA Perfluorooctane sulfonate (PFOS) in child numeric Logarithm base 2 PFOS PFOS
hs_pfunda_c_Log2 hs_pfunda_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFUNDA Postnatal NA NA Perfluoroundecanoate (PFUNDA) in child numeric Logarithm base 2 PFUNDA PFUNDA
hs_bpa_cadj_Log2 hs_bpa_cadj_Log2 Chemicals Phenols BPA Postnatal NA NA Bisphenol A (BPA) in child adjusted for creatinine numeric Logarithm base 2 BPA BPA
hs_bupa_cadj_Log2 hs_bupa_cadj_Log2 Chemicals Phenols BUPA Postnatal NA NA N-Butyl paraben (BUPA) in child adjusted for creatinine numeric Logarithm base 2 BUPA BUPA
hs_etpa_cadj_Log2 hs_etpa_cadj_Log2 Chemicals Phenols ETPA Postnatal NA NA Ethyl paraben (ETPA) in child adjusted for creatinine numeric Logarithm base 2 ETPA ETPA
hs_mepa_cadj_Log2 hs_mepa_cadj_Log2 Chemicals Phenols MEPA Postnatal NA NA Methyl paraben (MEPA) in child adjusted for creatinine numeric Logarithm base 2 MEPA MEPA
hs_oxbe_cadj_Log2 hs_oxbe_cadj_Log2 Chemicals Phenols OXBE Postnatal NA NA Oxybenzone (OXBE) in child adjusted for creatinine numeric Logarithm base 2 OXBE OXBE
hs_prpa_cadj_Log2 hs_prpa_cadj_Log2 Chemicals Phenols PRPA Postnatal NA NA Propyl paraben (PRPA) in child adjusted for creatinine numeric Logarithm base 2 PRPA PRPA
hs_trcs_cadj_Log2 hs_trcs_cadj_Log2 Chemicals Phenols TRCS Postnatal NA NA Triclosan (TRCS) in child adjusted for creatinine numeric Logarithm base 2 TRCS TRCS
hs_mbzp_cadj_Log2 hs_mbzp_cadj_Log2 Chemicals Phthalates MBZP Postnatal NA NA Mono benzyl phthalate (MBzP) in child adjusted for creatinine numeric Logarithm base 2 MBZP MBZP
hs_mecpp_cadj_Log2 hs_mecpp_cadj_Log2 Chemicals Phthalates MECPP Postnatal NA NA Mono-2-ethyl 5-carboxypentyl phthalate (MECPP) in child adjusted for creatinine numeric Logarithm base 2 MECPP MECPP
hs_mehhp_cadj_Log2 hs_mehhp_cadj_Log2 Chemicals Phthalates MEHHP Postnatal NA NA Mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) in child adjusted for creatinine numeric Logarithm base 2 MEHHP MEHHP
hs_mehp_cadj_Log2 hs_mehp_cadj_Log2 Chemicals Phthalates MEHP Postnatal NA NA Mono-2-ethylhexyl phthalate (MEHP) in child adjusted for creatinine numeric Logarithm base 2 MEHP MEHP
hs_meohp_cadj_Log2 hs_meohp_cadj_Log2 Chemicals Phthalates MEOHP Postnatal NA NA Mono-2-ethyl-5-oxohexyl phthalate (MEOHP) in child adjusted for creatinine numeric Logarithm base 2 MEOHP MEOHP
hs_mep_cadj_Log2 hs_mep_cadj_Log2 Chemicals Phthalates MEP Postnatal NA NA Monoethyl phthalate (MEP) in child adjusted for creatinine numeric Logarithm base 2 MEP MEP
hs_mibp_cadj_Log2 hs_mibp_cadj_Log2 Chemicals Phthalates MIBP Postnatal NA NA Mono-iso-butyl phthalate (MiBP) in child adjusted for creatinine numeric Logarithm base 2 MIBP MIBP
hs_mnbp_cadj_Log2 hs_mnbp_cadj_Log2 Chemicals Phthalates MNBP Postnatal NA NA Mono-n-butyl phthalate (MnBP) in child adjusted for creatinine numeric Logarithm base 2 MNBP MNBP
hs_ohminp_cadj_Log2 hs_ohminp_cadj_Log2 Chemicals Phthalates OHMiNP Postnatal NA NA Mono-4-methyl-7-hydroxyoctyl phthalate (OHMiNP) in child adjusted for creatinine numeric Logarithm base 2 OHMiNP OHMiNP
hs_oxominp_cadj_Log2 hs_oxominp_cadj_Log2 Chemicals Phthalates OXOMINP Postnatal NA NA Mono-4-methyl-7-oxooctyl phthalate (OXOMiNP) in child adjusted for creatinine numeric Logarithm base 2 OXOMINP OXOMINP
hs_sumDEHP_cadj_Log2 hs_sumDEHP_cadj_Log2 Chemicals Phthalates DEHP Postnatal NA NA Sum of DEHP metabolites (µg/g) in child adjusted for creatinine numeric Logarithm base 2 DEHP SumDEHP
FAS_cat_None FAS_cat_None Chemicals Social and economic capital Economic capital Postnatal NA NA Family affluence score factor None Family affluence FamAfl
hs_contactfam_3cat_num_None hs_contactfam_3cat_num_None Chemicals Social and economic capital Social capital Postnatal NA NA scoial capital: family friends factor None Social contact SocCont
hs_hm_pers_None hs_hm_pers_None Chemicals Social and economic capital Social capital Postnatal NA NA How many people live in your home? numeric None House crowding HouseCrow
hs_participation_3cat_None hs_participation_3cat_None Chemicals Social and economic capital Social capital Postnatal NA NA social capital: structural factor None Social participation SocPartic
hs_cotinine_cdich_None hs_cotinine_cdich_None Chemicals Tobacco Smoke Cotinine Postnatal NA NA Dichotomous variable of cotinine in child factor None Cotinine Cotinine
hs_globalexp2_None hs_globalexp2_None Chemicals Tobacco Smoke Tobacco Smoke Postnatal NA NA Global exposure of the child to ETS (2 categories) factor None ETS ETS
hs_smk_parents_None hs_smk_parents_None Chemicals Tobacco Smoke Tobacco Smoke Postnatal NA NA Tobacco Smoke status of parents (both) factor None Smoking_parents SmokPar
e3_sex_None e3_sex_None Covariates Covariates Child covariate Pregnancy NA NA Child sex (female / male) factor None Child sex Sex
e3_yearbir_None e3_yearbir_None Covariates Covariates Child covariate Pregnancy NA NA Year of birth (2003 to 2009) factor None Year of birth YearBirth
h_cohort h_cohort Covariates Covariates Maternal covariate Pregnancy NA NA Cohort of inclusion (1 to 6) factor None Cohort Cohort
hs_child_age_None hs_child_age_None Covariates Covariates Child covariate Postnatal NA NA Child age at examination (years) numeric None Child age cAge
hs_zbmi_who hs_zbmi_who Phenotype Phenotype Outcome at 6-11 years old Postnatal NA NA Body mass index z-score at 6-11 years old - WHO reference - Standardized on sex and age numeric None Body mass index z-score zBMI

Data Summary for Exposures, Covariates, and Outcome

Data Summary Exposures: Lifestyles

# specific lifestyle exposures
lifestyle_exposures <- c(
  "h_bfdur_Ter",
  "hs_bakery_prod_Ter",
  "hs_break_cer_Ter",
  "hs_dairy_Ter",
  "hs_fastfood_Ter",
  "hs_org_food_Ter",
  "hs_proc_meat_Ter",
  "hs_total_fish_Ter",
  "hs_total_fruits_Ter",
  "hs_total_lipids_Ter",
  "hs_total_sweets_Ter",
  "hs_total_veg_Ter"
)

lifestyle_exposome <- dplyr::select(exposome, all_of(lifestyle_exposures))
summarytools::view(dfSummary(lifestyle_exposome, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 h_bfdur_Ter [factor]
1. (0,10.8]
2. (10.8,34.9]
3. (34.9,Inf]
506(38.9%)
270(20.8%)
525(40.4%)
0 (0.0%)
2 hs_bakery_prod_Ter [factor]
1. (0,2]
2. (2,6]
3. (6,Inf]
345(26.5%)
423(32.5%)
533(41.0%)
0 (0.0%)
3 hs_break_cer_Ter [factor]
1. (0,1.1]
2. (1.1,5.5]
3. (5.5,Inf]
291(22.4%)
521(40.0%)
489(37.6%)
0 (0.0%)
4 hs_dairy_Ter [factor]
1. (0,14.6]
2. (14.6,25.6]
3. (25.6,Inf]
359(27.6%)
465(35.7%)
477(36.7%)
0 (0.0%)
5 hs_fastfood_Ter [factor]
1. (0,0.132]
2. (0.132,0.5]
3. (0.5,Inf]
143(11.0%)
603(46.3%)
555(42.7%)
0 (0.0%)
6 hs_org_food_Ter [factor]
1. (0,0.132]
2. (0.132,1]
3. (1,Inf]
429(33.0%)
396(30.4%)
476(36.6%)
0 (0.0%)
7 hs_proc_meat_Ter [factor]
1. (0,1.5]
2. (1.5,4]
3. (4,Inf]
366(28.1%)
471(36.2%)
464(35.7%)
0 (0.0%)
8 hs_total_fish_Ter [factor]
1. (0,1.5]
2. (1.5,3]
3. (3,Inf]
389(29.9%)
454(34.9%)
458(35.2%)
0 (0.0%)
9 hs_total_fruits_Ter [factor]
1. (0,7]
2. (7,14.1]
3. (14.1,Inf]
413(31.7%)
407(31.3%)
481(37.0%)
0 (0.0%)
10 hs_total_lipids_Ter [factor]
1. (0,3]
2. (3,7]
3. (7,Inf]
397(30.5%)
403(31.0%)
501(38.5%)
0 (0.0%)
11 hs_total_sweets_Ter [factor]
1. (0,4.1]
2. (4.1,8.5]
3. (8.5,Inf]
344(26.4%)
516(39.7%)
441(33.9%)
0 (0.0%)
12 hs_total_veg_Ter [factor]
1. (0,6]
2. (6,8.5]
3. (8.5,Inf]
404(31.1%)
314(24.1%)
583(44.8%)
0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-15

categorical_lifestyle <- lifestyle_exposome %>% 
  dplyr::select(where(is.factor))

categorical_lifestyle_long <- pivot_longer(
  categorical_lifestyle,
  cols = everything(),
  names_to = "variable",
  values_to = "value"
)

unique_categorical_vars <- unique(categorical_lifestyle_long$variable)
categorical_plots <- lapply(unique_categorical_vars, function(var) {
  data <- filter(categorical_lifestyle_long, variable == var)
  
  p <- ggplot(data, aes(x = value, fill = value)) +
    geom_bar(stat = "count") +
    labs(title = paste("Distribution of", var), x = var, y = "Count")
  
  print(p)
  return(p)
})

Breastfeeding Duration: Majority of observations are in the highest duration category, suggesting longer breastfeeding periods are common.

Bakery Products: Shows a relatively even distribution across the three categories, indicating varied consumption levels of bakery products among participants.

Breakfast Cereal: The highest category of cereal consumption is the most common, suggesting a preference for or greater consumption of cereals.

Dairy: Shows a fairly even distribution across all categories, indicating a uniform consumption pattern of dairy products.

Fast Food: Most participants fall into the middle category, indicating moderate consumption of fast food.

Organic Food: Most participants either consume a lot of or no organic food, with fewer in the middle range.

Processed Meat: Consumption levels are fairly evenly distributed, indicating varied dietary habits regarding processed meats.

Bread: Distribution shows a significant leaning towards higher bread consumption.

Cereal: Even distribution across categories suggests varied cereal consumption habits.

Fish and Seafood: Even distribution across categories, indicating varied consumption of fish and seafood.

Fruits: High fruit consumption is the most common, with fewer participants in the lowest category.

Added Fats: More participants consume added fats at the lowest and highest levels, with fewer in the middle.

Sweets: High consumption of sweets is the most common, indicating a preference for or higher access to sugary foods.

Vegetables: Most participants consume a high amount of vegetables.

Data Summary Exposures: Chemicals

# specific chemical exposures
chemical_exposures <- c(
  "hs_cd_c_Log2",
  "hs_co_c_Log2",
  "hs_cs_c_Log2",
  "hs_cu_c_Log2",
  "hs_hg_c_Log2",
  "hs_mo_c_Log2",
  "hs_pb_c_Log2",
  "hs_dde_cadj_Log2",
  "hs_pcb153_cadj_Log2",
  "hs_pcb170_cadj_Log2",
  "hs_dep_cadj_Log2",
  "hs_pbde153_cadj_Log2",
  "hs_pfhxs_c_Log2",
  "hs_pfoa_c_Log2",
  "hs_pfos_c_Log2",
  "hs_prpa_cadj_Log2",
  "hs_mbzp_cadj_Log2",
  "hs_mibp_cadj_Log2",
  "hs_mnbp_cadj_Log2"
)

chemical_exposome <- dplyr::select(exposome, all_of(chemical_exposures))
summarytools::view(dfSummary(chemical_exposome, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 hs_cd_c_Log2 [numeric]
Mean (sd) : -4 (1)
min ≤ med ≤ max:
-10.4 ≤ -3.8 ≤ 0.8
IQR (CV) : 1 (-0.3)
695 distinct values 0 (0.0%)
2 hs_co_c_Log2 [numeric]
Mean (sd) : -2.3 (0.6)
min ≤ med ≤ max:
-5.5 ≤ -2.4 ≤ 1.4
IQR (CV) : 0.7 (-0.3)
317 distinct values 0 (0.0%)
3 hs_cs_c_Log2 [numeric]
Mean (sd) : 0.4 (0.6)
min ≤ med ≤ max:
-1.5 ≤ 0.5 ≤ 3.1
IQR (CV) : 0.8 (1.3)
369 distinct values 0 (0.0%)
4 hs_cu_c_Log2 [numeric]
Mean (sd) : 9.8 (0.2)
min ≤ med ≤ max:
9.1 ≤ 9.8 ≤ 12.1
IQR (CV) : 0.3 (0)
345 distinct values 0 (0.0%)
5 hs_hg_c_Log2 [numeric]
Mean (sd) : -0.3 (1.7)
min ≤ med ≤ max:
-10.9 ≤ -0.2 ≤ 3.7
IQR (CV) : 2.1 (-5.6)
698 distinct values 0 (0.0%)
6 hs_mo_c_Log2 [numeric]
Mean (sd) : -0.3 (0.9)
min ≤ med ≤ max:
-9.2 ≤ -0.4 ≤ 5.1
IQR (CV) : 0.8 (-2.9)
593 distinct values 0 (0.0%)
7 hs_pb_c_Log2 [numeric]
Mean (sd) : 3.1 (0.6)
min ≤ med ≤ max:
1.1 ≤ 3.1 ≤ 7.7
IQR (CV) : 0.8 (0.2)
529 distinct values 0 (0.0%)
8 hs_dde_cadj_Log2 [numeric]
Mean (sd) : 4.7 (1.5)
min ≤ med ≤ max:
1.2 ≤ 4.5 ≤ 11.1
IQR (CV) : 1.9 (0.3)
1050 distinct values 0 (0.0%)
9 hs_pcb153_cadj_Log2 [numeric]
Mean (sd) : 3.6 (0.9)
min ≤ med ≤ max:
1.2 ≤ 3.5 ≤ 7.8
IQR (CV) : 1.4 (0.3)
1047 distinct values 0 (0.0%)
10 hs_pcb170_cadj_Log2 [numeric]
Mean (sd) : -0.3 (3)
min ≤ med ≤ max:
-16.8 ≤ 0.3 ≤ 4.8
IQR (CV) : 2.2 (-9.8)
1039 distinct values 0 (0.0%)
11 hs_dep_cadj_Log2 [numeric]
Mean (sd) : 0.2 (3.2)
min ≤ med ≤ max:
-12.6 ≤ 0.9 ≤ 9.4
IQR (CV) : 3.3 (20)
1045 distinct values 0 (0.0%)
12 hs_pbde153_cadj_Log2 [numeric]
Mean (sd) : -4.5 (3.8)
min ≤ med ≤ max:
-17.6 ≤ -2.6 ≤ 4
IQR (CV) : 6.7 (-0.8)
1036 distinct values 0 (0.0%)
13 hs_pfhxs_c_Log2 [numeric]
Mean (sd) : -1.6 (1.3)
min ≤ med ≤ max:
-8.9 ≤ -1.4 ≤ 4.8
IQR (CV) : 1.7 (-0.8)
1061 distinct values 0 (0.0%)
14 hs_pfoa_c_Log2 [numeric]
Mean (sd) : 0.6 (0.6)
min ≤ med ≤ max:
-2.2 ≤ 0.6 ≤ 2.7
IQR (CV) : 0.7 (0.9)
1061 distinct values 0 (0.0%)
15 hs_pfos_c_Log2 [numeric]
Mean (sd) : 1 (1.1)
min ≤ med ≤ max:
-10.4 ≤ 1 ≤ 5.1
IQR (CV) : 1.3 (1.1)
1050 distinct values 0 (0.0%)
16 hs_prpa_cadj_Log2 [numeric]
Mean (sd) : -1.6 (3.8)
min ≤ med ≤ max:
-12 ≤ -2.3 ≤ 10.8
IQR (CV) : 5.2 (-2.4)
1031 distinct values 0 (0.0%)
17 hs_mbzp_cadj_Log2 [numeric]
Mean (sd) : 2.4 (1.2)
min ≤ med ≤ max:
-0.6 ≤ 2.3 ≤ 7.2
IQR (CV) : 1.5 (0.5)
1046 distinct values 0 (0.0%)
18 hs_mibp_cadj_Log2 [numeric]
Mean (sd) : 5.5 (1.1)
min ≤ med ≤ max:
2.3 ≤ 5.4 ≤ 9.8
IQR (CV) : 1.5 (0.2)
1057 distinct values 0 (0.0%)
19 hs_mnbp_cadj_Log2 [numeric]
Mean (sd) : 4.7 (1)
min ≤ med ≤ max:
1.9 ≤ 4.6 ≤ 8.9
IQR (CV) : 1.3 (0.2)
1048 distinct values 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-15

#separate numeric and categorical data
numeric_chemical <- chemical_exposome %>% 
  dplyr::select(where(is.numeric))

numeric_chemical_long <- pivot_longer(
  numeric_chemical,
  cols = everything(),
  names_to = "variable",
  values_to = "value"
)

unique_numerical_vars <- unique(numeric_chemical_long$variable)

num_plots <- lapply(unique_numerical_vars, function(var) {
  data <- filter(numeric_chemical_long, variable == var)
  p <- ggplot(data, aes(x = value)) +
    geom_histogram(bins = 30, fill = "blue") +
    labs(title = paste("Histogram of", var), x = "Value", y = "Count")
  print(p)
  return(p)
})

Cadmium (hs_cd_c_Log2): The distribution of cadmium levels is skewed to the right, indicating that most participants have lower exposure levels, with a few cases showing significantly higher exposures.

Cobalt (hs_co_c_Log2): The histogram of cobalt levels displays a roughly normal distribution centered around a slight positive skew. This suggests a common source of exposure with varying levels among the population.

Cesium (hs_cs_c_Log2): Exhibits a right-skewed distribution, indicating that most participants have relatively low exposure levels, but a small number have substantially higher exposures.

Copper (hs_cu_c_Log2): Shows a right-skewed distribution, suggesting that while most individuals have moderate exposure, a few experience significantly higher levels of copper.

Mercury (hs_hg_c_Log2): This distribution is also right-skewed, common for environmental pollutants, where a majority have lower exposure levels, and a minority have high exposure levels.

Molybdenum (hs_mo_c_Log2): Shows a distribution with a sharp peak and a long right tail, suggesting that while most people have similar exposure levels, a few have exceptionally high exposures.

Lead (hs_pb_c_Log2): The distribution is slightly right-skewed, indicating higher exposure levels in a smaller group of the population compared to the majority.

DDE (hs_dde_cadj_Log2): Shows a pronounced right skew, typical for chemicals that accumulate in the environment and in human tissues, indicating higher levels of exposure in a smaller subset of the population..

PCB 153 (hs_pcb153_cadj_Log2): Has a distribution with right skewness, suggesting that exposure to these compounds is higher among a smaller segment of the population.

PCB 170 (hs_pcb170_cadj_Log2): This histograms show a significant right skew, indicating lower concentrations of these chemicals in most samples, with fewer samples showing higher concentrations. This pattern suggests that while most individuals have low exposure, a few may have considerably higher levels.

DEP and PBDE 153: These histograms mostly show multimodal distributions (more than one peak), suggesting different exposure sources or groups within the population that have distinct exposure levels. The multiple peaks could indicate varied exposure pathways or differences in how these chemicals are metabolized or retained in the body.

PFHxS and PFOA: These perfluorinated compounds display a roughly normal distribution skewed right, suggesting a common source of exposure among the population, but with some individuals experiencing higher exposures.

PFOS and PFUnDA: The histograms show a single, sharp peak with a rapid decline, indicating that most individuals have similar exposure levels, likely due to common environmental sources or regulatory controls limiting variability.

MBZP (Monobenzyl Phthalate): This histogram shows a right-skewed distribution. Most values cluster at the lower end, indicating a common lower exposure level among subjects, with a long tail towards higher values suggesting occasional higher exposures.

MECPP (Mono-ethyl hexyl phthalate): The distribution is right-skewed, similar to MBZP, but with a smoother decline. This pattern also indicates that while most subjects have lower exposure levels, a few experience significantly higher exposures.

MEHHP (Mono-2-ethyl-5-hydroxyhexyl phthalate): Exhibits a unimodal distribution with a peak around a middle value and symmetric tails. This could indicate a more standardized exposure level among the subjects with some variation.

MEHP (Mono-ethylhexyl phthalate):Another right-skewed distribution, indicating that most subjects have lower exposure levels but a few have much higher levels.

MEOHP (Mono-2-ethyl-5-oxohexyl phthalate): This histogram shows a distribution with a peak around the middle values and a tail extending towards higher values, suggesting a central tendency with some higher exposures.

MEP (Mono-ethyl phthalate): The distribution is right-skewed, similar to others, showing most subjects with low to moderate levels of exposure, but a few have much higher levels.

numeric_chemical <- select_if(chemical_exposome, is.numeric)
cor_matrix <- cor(numeric_chemical, use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black", tl.srt = 90, tl.cex = 0.6)

Data Summary Covariates

# Specified covariates
specific_covariates <- c(
  "e3_sex_None", 
  "e3_yearbir_None",
  "h_cohort", 
  "hs_child_age_None"
)

covariate_data <- dplyr::select(covariates, all_of(specific_covariates))
summarytools::view(dfSummary(covariate_data, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 e3_sex_None [factor]
1. female
2. male
608(46.7%)
693(53.3%)
0 (0.0%)
2 e3_yearbir_None [factor]
1. 2003
2. 2004
3. 2005
4. 2006
5. 2007
6. 2008
7. 2009
55(4.2%)
107(8.2%)
241(18.5%)
256(19.7%)
250(19.2%)
379(29.1%)
13(1.0%)
0 (0.0%)
3 h_cohort [factor]
1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
202(15.5%)
198(15.2%)
224(17.2%)
207(15.9%)
272(20.9%)
198(15.2%)
0 (0.0%)
4 hs_child_age_None [numeric]
Mean (sd) : 8 (1.6)
min ≤ med ≤ max:
5.4 ≤ 8 ≤ 12.1
IQR (CV) : 2.4 (0.2)
879 distinct values 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-15

#separate numeric and categorical data
numeric_covariates <- covariate_data %>% 
  dplyr::select(where(is.numeric))

numeric_covariates_long <- pivot_longer(
  numeric_covariates,
  cols = everything(),
  names_to = "variable",
  values_to = "value"
)

unique_numerical_vars <- unique(numeric_covariates_long$variable)

num_plots <- lapply(unique_numerical_vars, function(var) {
  data <- filter(numeric_covariates_long, variable == var)
  p <- ggplot(data, aes(x = value)) +
    geom_histogram(bins = 30, fill = "blue") +
    labs(title = paste("Histogram of", var), x = "Value", y = "Count")
  print(p)
  return(p)
})

Child’s Age (hs_child_age): This histogram is multimodal, reflecting several peaks across different ages. This could be indicative of the data collection points or particular age groups being studied.

categorical_covariates <- covariate_data %>% 
  dplyr::select(where(is.factor))

categorical_covariates_long <- pivot_longer(
  categorical_covariates,
  cols = everything(),
  names_to = "variable",
  values_to = "value"
)

unique_categorical_vars <- unique(categorical_covariates_long$variable)
categorical_plots <- lapply(unique_categorical_vars, function(var) {
  data <- filter(categorical_covariates_long, variable == var)
  
  p <- ggplot(data, aes(x = value, fill = value)) +
    geom_bar(stat = "count") +
    labs(title = paste("Distribution of", var), x = var, y = "Count")
  
  print(p)
  return(p)
})

Cohorts (h_cohort): The distribution shows the count of subjects across six different cohorts. All cohorts have a substantial number of subjects, with cohort 5 showing the highest participation.

Gender Distribution (e3_sex): The gender distribution is nearly balanced with a slight higher count for males compared to females.

Year of Birth (e3_yearbir): This chart shows that the majority of subjects were born in the later years, with a significant increase in 2009, indicating perhaps a larger recruitment or a specific cohort focus that year.

Data Summary Outcome: Phenotype

outcome_BMI <- phenotype %>% 
  dplyr::select(hs_zbmi_who)
summarytools::view(dfSummary(outcome_BMI, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 hs_zbmi_who [numeric]
Mean (sd) : 0.4 (1.2)
min ≤ med ≤ max:
-3.6 ≤ 0.3 ≤ 4.7
IQR (CV) : 1.5 (3)
421 distinct values 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-15

# Combine all selected data
combined_data <- cbind(covariate_data, lifestyle_exposome, chemical_exposome, outcome_BMI)

# Ensure no duplicated columns
combined_data <- combined_data[, !duplicated(colnames(combined_data))]

# Convert sex variable to a factor for stratification
combined_data$e3_sex_None <- as.factor(combined_data$e3_sex_None)
levels(combined_data$e3_sex_None) <- c("Male", "Female")

render_cont <- function(x) {
  with(stats.default(x), sprintf("%0.2f (%0.2f)", MEAN, SD))
}

render_cat <- function(x) {
  c("", sapply(stats.default(x), function(y) with(y, sprintf("%d (%0.1f %%)", FREQ, PCT))))
}

# Define the formula for table1
table1_formula <- ~ 
  hs_child_age_None + e3_yearbir_None + h_cohort +
  hs_zbmi_who +
  h_bfdur_Ter + hs_bakery_prod_Ter + hs_break_cer_Ter + hs_dairy_Ter + hs_fastfood_Ter + hs_org_food_Ter +
  hs_proc_meat_Ter +
  hs_total_fish_Ter + hs_total_fruits_Ter + hs_total_lipids_Ter + hs_total_sweets_Ter + hs_total_veg_Ter +
  hs_cd_c_Log2 + hs_co_c_Log2 + hs_cs_c_Log2 + hs_cu_c_Log2 +
  hs_hg_c_Log2 + hs_mo_c_Log2 + hs_dde_cadj_Log2 + hs_pcb153_cadj_Log2 +
  hs_pcb170_cadj_Log2 + hs_dep_cadj_Log2 + hs_pbde153_cadj_Log2 +
  hs_pfhxs_c_Log2 + hs_pfoa_c_Log2 + hs_pfos_c_Log2 + hs_prpa_cadj_Log2 +
  hs_mbzp_cadj_Log2 + hs_mibp_cadj_Log2 + hs_mnbp_cadj_Log2 | e3_sex_None

# Create the table
table1(
  table1_formula,
  data = combined_data,
  render.continuous = render_cont,
  render.categorical = render_cat,
  overall = TRUE,
  topclass = "Rtable1-shade"
)
Male
(N=608)
Female
(N=693)
TRUE
(N=1301)
hs_child_age_None 7.91 (1.58) 8.03 (1.64) 7.98 (1.61)
e3_yearbir_None
2003 25 (4.1 %) 30 (4.3 %) 55 (4.2 %)
2004 46 (7.6 %) 61 (8.8 %) 107 (8.2 %)
2005 121 (19.9 %) 120 (17.3 %) 241 (18.5 %)
2006 108 (17.8 %) 148 (21.4 %) 256 (19.7 %)
2007 128 (21.1 %) 122 (17.6 %) 250 (19.2 %)
2008 177 (29.1 %) 202 (29.1 %) 379 (29.1 %)
2009 3 (0.5 %) 10 (1.4 %) 13 (1.0 %)
h_cohort
1 97 (16.0 %) 105 (15.2 %) 202 (15.5 %)
2 86 (14.1 %) 112 (16.2 %) 198 (15.2 %)
3 102 (16.8 %) 122 (17.6 %) 224 (17.2 %)
4 93 (15.3 %) 114 (16.5 %) 207 (15.9 %)
5 129 (21.2 %) 143 (20.6 %) 272 (20.9 %)
6 101 (16.6 %) 97 (14.0 %) 198 (15.2 %)
hs_zbmi_who 0.35 (1.15) 0.45 (1.22) 0.40 (1.19)
h_bfdur_Ter
(0,10.8] 231 (38.0 %) 275 (39.7 %) 506 (38.9 %)
(10.8,34.9] 118 (19.4 %) 152 (21.9 %) 270 (20.8 %)
(34.9,Inf] 259 (42.6 %) 266 (38.4 %) 525 (40.4 %)
hs_bakery_prod_Ter
(0,2] 164 (27.0 %) 181 (26.1 %) 345 (26.5 %)
(2,6] 188 (30.9 %) 235 (33.9 %) 423 (32.5 %)
(6,Inf] 256 (42.1 %) 277 (40.0 %) 533 (41.0 %)
hs_break_cer_Ter
(0,1.1] 141 (23.2 %) 150 (21.6 %) 291 (22.4 %)
(1.1,5.5] 251 (41.3 %) 270 (39.0 %) 521 (40.0 %)
(5.5,Inf] 216 (35.5 %) 273 (39.4 %) 489 (37.6 %)
hs_dairy_Ter
(0,14.6] 175 (28.8 %) 184 (26.6 %) 359 (27.6 %)
(14.6,25.6] 229 (37.7 %) 236 (34.1 %) 465 (35.7 %)
(25.6,Inf] 204 (33.6 %) 273 (39.4 %) 477 (36.7 %)
hs_fastfood_Ter
(0,0.132] 75 (12.3 %) 68 (9.8 %) 143 (11.0 %)
(0.132,0.5] 273 (44.9 %) 330 (47.6 %) 603 (46.3 %)
(0.5,Inf] 260 (42.8 %) 295 (42.6 %) 555 (42.7 %)
hs_org_food_Ter
(0,0.132] 211 (34.7 %) 218 (31.5 %) 429 (33.0 %)
(0.132,1] 191 (31.4 %) 205 (29.6 %) 396 (30.4 %)
(1,Inf] 206 (33.9 %) 270 (39.0 %) 476 (36.6 %)
hs_proc_meat_Ter
(0,1.5] 175 (28.8 %) 191 (27.6 %) 366 (28.1 %)
(1.5,4] 227 (37.3 %) 244 (35.2 %) 471 (36.2 %)
(4,Inf] 206 (33.9 %) 258 (37.2 %) 464 (35.7 %)
hs_total_fish_Ter
(0,1.5] 183 (30.1 %) 206 (29.7 %) 389 (29.9 %)
(1.5,3] 224 (36.8 %) 230 (33.2 %) 454 (34.9 %)
(3,Inf] 201 (33.1 %) 257 (37.1 %) 458 (35.2 %)
hs_total_fruits_Ter
(0,7] 174 (28.6 %) 239 (34.5 %) 413 (31.7 %)
(7,14.1] 216 (35.5 %) 191 (27.6 %) 407 (31.3 %)
(14.1,Inf] 218 (35.9 %) 263 (38.0 %) 481 (37.0 %)
hs_total_lipids_Ter
(0,3] 193 (31.7 %) 204 (29.4 %) 397 (30.5 %)
(3,7] 171 (28.1 %) 232 (33.5 %) 403 (31.0 %)
(7,Inf] 244 (40.1 %) 257 (37.1 %) 501 (38.5 %)
hs_total_sweets_Ter
(0,4.1] 149 (24.5 %) 195 (28.1 %) 344 (26.4 %)
(4.1,8.5] 251 (41.3 %) 265 (38.2 %) 516 (39.7 %)
(8.5,Inf] 208 (34.2 %) 233 (33.6 %) 441 (33.9 %)
hs_total_veg_Ter
(0,6] 190 (31.2 %) 214 (30.9 %) 404 (31.1 %)
(6,8.5] 136 (22.4 %) 178 (25.7 %) 314 (24.1 %)
(8.5,Inf] 282 (46.4 %) 301 (43.4 %) 583 (44.8 %)
hs_cd_c_Log2 -3.99 (0.98) -3.95 (1.09) -3.97 (1.04)
hs_co_c_Log2 -2.37 (0.61) -2.32 (0.64) -2.34 (0.63)
hs_cs_c_Log2 0.44 (0.58) 0.44 (0.57) 0.44 (0.57)
hs_cu_c_Log2 9.81 (0.25) 9.84 (0.22) 9.83 (0.23)
hs_hg_c_Log2 -0.24 (1.59) -0.35 (1.75) -0.30 (1.68)
hs_mo_c_Log2 -0.32 (0.83) -0.31 (0.96) -0.32 (0.90)
hs_dde_cadj_Log2 4.63 (1.48) 4.70 (1.50) 4.67 (1.49)
hs_pcb153_cadj_Log2 3.47 (0.86) 3.63 (0.94) 3.56 (0.90)
hs_pcb170_cadj_Log2 -0.60 (3.22) -0.05 (2.77) -0.31 (3.00)
hs_dep_cadj_Log2 0.27 (3.16) 0.06 (3.25) 0.16 (3.21)
hs_pbde153_cadj_Log2 -4.66 (3.86) -4.40 (3.80) -4.53 (3.83)
hs_pfhxs_c_Log2 -1.62 (1.30) -1.53 (1.31) -1.57 (1.31)
hs_pfoa_c_Log2 0.60 (0.55) 0.62 (0.56) 0.61 (0.55)
hs_pfos_c_Log2 0.95 (1.15) 0.99 (1.08) 0.97 (1.11)
hs_prpa_cadj_Log2 -1.26 (3.96) -1.91 (3.68) -1.61 (3.82)
hs_mbzp_cadj_Log2 2.42 (1.23) 2.47 (1.22) 2.44 (1.22)
hs_mibp_cadj_Log2 5.54 (1.09) 5.39 (1.12) 5.46 (1.11)
hs_mnbp_cadj_Log2 4.77 (1.08) 4.60 (0.96) 4.68 (1.02)
combined_data$h_cohort <- as.factor(combined_data$h_cohort)
# Create the table
table1(
  ~ hs_child_age_None + e3_sex_None + e3_yearbir_None +
    hs_zbmi_who + h_bfdur_Ter + hs_bakery_prod_Ter +
    hs_break_cer_Ter + hs_dairy_Ter + hs_fastfood_Ter +
    hs_org_food_Ter + hs_proc_meat_Ter + hs_total_fish_Ter + hs_total_fruits_Ter +
    hs_total_lipids_Ter +
    hs_total_sweets_Ter + hs_total_veg_Ter +
    hs_cd_c_Log2 + hs_co_c_Log2 + hs_cs_c_Log2 + hs_cu_c_Log2 +
    hs_hg_c_Log2 + hs_mo_c_Log2 + hs_dde_cadj_Log2 + hs_pcb153_cadj_Log2 +
    hs_pcb170_cadj_Log2 + hs_dep_cadj_Log2 + hs_pbde153_cadj_Log2 +
    hs_pfhxs_c_Log2 + hs_pfoa_c_Log2 + hs_pfos_c_Log2 + hs_prpa_cadj_Log2 +
    hs_mbzp_cadj_Log2 + hs_mibp_cadj_Log2 + hs_mnbp_cadj_Log2 | h_cohort,
    data = combined_data,
  render.continuous = render_cont,
  render.categorical = render_cat,
  overall = TRUE,
  topclass = "Rtable1-shade"
)
1
(N=202)
2
(N=198)
3
(N=224)
4
(N=207)
5
(N=272)
6
(N=198)
TRUE
(N=1301)
hs_child_age_None 6.61 (0.28) 10.82 (0.58) 8.78 (0.58) 6.48 (0.47) 8.46 (0.53) 6.51 (0.30) 7.98 (1.61)
e3_sex_None
Male 97 (48.0 %) 86 (43.4 %) 102 (45.5 %) 93 (44.9 %) 129 (47.4 %) 101 (51.0 %) 608 (46.7 %)
Female 105 (52.0 %) 112 (56.6 %) 122 (54.5 %) 114 (55.1 %) 143 (52.6 %) 97 (49.0 %) 693 (53.3 %)
e3_yearbir_None
2003 0 (0.0 %) 55 (27.8 %) 0 (0.0 %) 0 (0.0 %) 0 (0.0 %) 0 (0.0 %) 55 (4.2 %)
2004 0 (0.0 %) 107 (54.0 %) 0 (0.0 %) 0 (0.0 %) 0 (0.0 %) 0 (0.0 %) 107 (8.2 %)
2005 0 (0.0 %) 36 (18.2 %) 120 (53.6 %) 0 (0.0 %) 85 (31.2 %) 0 (0.0 %) 241 (18.5 %)
2006 0 (0.0 %) 0 (0.0 %) 99 (44.2 %) 0 (0.0 %) 157 (57.7 %) 0 (0.0 %) 256 (19.7 %)
2007 82 (40.6 %) 0 (0.0 %) 5 (2.2 %) 62 (30.0 %) 30 (11.0 %) 71 (35.9 %) 250 (19.2 %)
2008 117 (57.9 %) 0 (0.0 %) 0 (0.0 %) 136 (65.7 %) 0 (0.0 %) 126 (63.6 %) 379 (29.1 %)
2009 3 (1.5 %) 0 (0.0 %) 0 (0.0 %) 9 (4.3 %) 0 (0.0 %) 1 (0.5 %) 13 (1.0 %)
hs_zbmi_who 0.20 (1.15) 0.19 (1.13) 0.80 (1.22) 0.52 (1.22) 0.09 (0.90) 0.68 (1.37) 0.40 (1.19)
h_bfdur_Ter
(0,10.8] 74 (36.6 %) 119 (60.1 %) 70 (31.2 %) 58 (28.0 %) 101 (37.1 %) 84 (42.4 %) 506 (38.9 %)
(10.8,34.9] 2 (1.0 %) 57 (28.8 %) 100 (44.6 %) 30 (14.5 %) 0 (0.0 %) 81 (40.9 %) 270 (20.8 %)
(34.9,Inf] 126 (62.4 %) 22 (11.1 %) 54 (24.1 %) 119 (57.5 %) 171 (62.9 %) 33 (16.7 %) 525 (40.4 %)
hs_bakery_prod_Ter
(0,2] 29 (14.4 %) 41 (20.7 %) 39 (17.4 %) 34 (16.4 %) 187 (68.8 %) 15 (7.6 %) 345 (26.5 %)
(2,6] 66 (32.7 %) 51 (25.8 %) 89 (39.7 %) 84 (40.6 %) 74 (27.2 %) 59 (29.8 %) 423 (32.5 %)
(6,Inf] 107 (53.0 %) 106 (53.5 %) 96 (42.9 %) 89 (43.0 %) 11 (4.0 %) 124 (62.6 %) 533 (41.0 %)
hs_break_cer_Ter
(0,1.1] 18 (8.9 %) 65 (32.8 %) 61 (27.2 %) 38 (18.4 %) 57 (21.0 %) 52 (26.3 %) 291 (22.4 %)
(1.1,5.5] 55 (27.2 %) 67 (33.8 %) 89 (39.7 %) 101 (48.8 %) 114 (41.9 %) 95 (48.0 %) 521 (40.0 %)
(5.5,Inf] 129 (63.9 %) 66 (33.3 %) 74 (33.0 %) 68 (32.9 %) 101 (37.1 %) 51 (25.8 %) 489 (37.6 %)
hs_dairy_Ter
(0,14.6] 21 (10.4 %) 41 (20.7 %) 55 (24.6 %) 128 (61.8 %) 76 (27.9 %) 38 (19.2 %) 359 (27.6 %)
(14.6,25.6] 86 (42.6 %) 49 (24.7 %) 99 (44.2 %) 51 (24.6 %) 91 (33.5 %) 89 (44.9 %) 465 (35.7 %)
(25.6,Inf] 95 (47.0 %) 108 (54.5 %) 70 (31.2 %) 28 (13.5 %) 105 (38.6 %) 71 (35.9 %) 477 (36.7 %)
hs_fastfood_Ter
(0,0.132] 18 (8.9 %) 23 (11.6 %) 18 (8.0 %) 51 (24.6 %) 24 (8.8 %) 9 (4.5 %) 143 (11.0 %)
(0.132,0.5] 40 (19.8 %) 101 (51.0 %) 127 (56.7 %) 106 (51.2 %) 169 (62.1 %) 60 (30.3 %) 603 (46.3 %)
(0.5,Inf] 144 (71.3 %) 74 (37.4 %) 79 (35.3 %) 50 (24.2 %) 79 (29.0 %) 129 (65.2 %) 555 (42.7 %)
hs_org_food_Ter
(0,0.132] 114 (56.4 %) 51 (25.8 %) 118 (52.7 %) 19 (9.2 %) 9 (3.3 %) 118 (59.6 %) 429 (33.0 %)
(0.132,1] 40 (19.8 %) 73 (36.9 %) 70 (31.2 %) 75 (36.2 %) 109 (40.1 %) 29 (14.6 %) 396 (30.4 %)
(1,Inf] 48 (23.8 %) 74 (37.4 %) 36 (16.1 %) 113 (54.6 %) 154 (56.6 %) 51 (25.8 %) 476 (36.6 %)
hs_proc_meat_Ter
(0,1.5] 118 (58.4 %) 47 (23.7 %) 25 (11.2 %) 83 (40.1 %) 39 (14.3 %) 54 (27.3 %) 366 (28.1 %)
(1.5,4] 32 (15.8 %) 90 (45.5 %) 85 (37.9 %) 71 (34.3 %) 85 (31.2 %) 108 (54.5 %) 471 (36.2 %)
(4,Inf] 52 (25.7 %) 61 (30.8 %) 114 (50.9 %) 53 (25.6 %) 148 (54.4 %) 36 (18.2 %) 464 (35.7 %)
hs_total_fish_Ter
(0,1.5] 82 (40.6 %) 38 (19.2 %) 25 (11.2 %) 130 (62.8 %) 38 (14.0 %) 76 (38.4 %) 389 (29.9 %)
(1.5,3] 53 (26.2 %) 103 (52.0 %) 47 (21.0 %) 57 (27.5 %) 94 (34.6 %) 100 (50.5 %) 454 (34.9 %)
(3,Inf] 67 (33.2 %) 57 (28.8 %) 152 (67.9 %) 20 (9.7 %) 140 (51.5 %) 22 (11.1 %) 458 (35.2 %)
hs_total_fruits_Ter
(0,7] 26 (12.9 %) 107 (54.0 %) 83 (37.1 %) 99 (47.8 %) 35 (12.9 %) 63 (31.8 %) 413 (31.7 %)
(7,14.1] 42 (20.8 %) 45 (22.7 %) 85 (37.9 %) 64 (30.9 %) 82 (30.1 %) 89 (44.9 %) 407 (31.3 %)
(14.1,Inf] 134 (66.3 %) 46 (23.2 %) 56 (25.0 %) 44 (21.3 %) 155 (57.0 %) 46 (23.2 %) 481 (37.0 %)
hs_total_lipids_Ter
(0,3] 18 (8.9 %) 31 (15.7 %) 151 (67.4 %) 24 (11.6 %) 32 (11.8 %) 141 (71.2 %) 397 (30.5 %)
(3,7] 72 (35.6 %) 90 (45.5 %) 40 (17.9 %) 74 (35.7 %) 82 (30.1 %) 45 (22.7 %) 403 (31.0 %)
(7,Inf] 112 (55.4 %) 77 (38.9 %) 33 (14.7 %) 109 (52.7 %) 158 (58.1 %) 12 (6.1 %) 501 (38.5 %)
hs_total_sweets_Ter
(0,4.1] 50 (24.8 %) 39 (19.7 %) 93 (41.5 %) 19 (9.2 %) 89 (32.7 %) 54 (27.3 %) 344 (26.4 %)
(4.1,8.5] 77 (38.1 %) 61 (30.8 %) 88 (39.3 %) 58 (28.0 %) 125 (46.0 %) 107 (54.0 %) 516 (39.7 %)
(8.5,Inf] 75 (37.1 %) 98 (49.5 %) 43 (19.2 %) 130 (62.8 %) 58 (21.3 %) 37 (18.7 %) 441 (33.9 %)
hs_total_veg_Ter
(0,6] 65 (32.2 %) 53 (26.8 %) 94 (42.0 %) 81 (39.1 %) 42 (15.4 %) 69 (34.8 %) 404 (31.1 %)
(6,8.5] 41 (20.3 %) 42 (21.2 %) 69 (30.8 %) 53 (25.6 %) 57 (21.0 %) 52 (26.3 %) 314 (24.1 %)
(8.5,Inf] 96 (47.5 %) 103 (52.0 %) 61 (27.2 %) 73 (35.3 %) 173 (63.6 %) 77 (38.9 %) 583 (44.8 %)
hs_cd_c_Log2 -3.87 (0.84) -4.06 (1.22) -4.22 (1.23) -4.16 (1.11) -3.60 (0.74) -3.99 (0.91) -3.97 (1.04)
hs_co_c_Log2 -2.31 (0.52) -2.38 (0.56) -2.46 (0.64) -2.37 (0.64) -2.53 (0.64) -1.93 (0.56) -2.34 (0.63)
hs_cs_c_Log2 0.12 (0.45) 1.01 (0.47) 0.61 (0.45) -0.17 (0.39) 0.71 (0.40) 0.29 (0.39) 0.44 (0.57)
hs_cu_c_Log2 9.86 (0.23) 9.88 (0.25) 9.83 (0.20) 9.80 (0.21) 9.71 (0.21) 9.93 (0.21) 9.83 (0.23)
hs_hg_c_Log2 -0.56 (1.59) 0.67 (1.29) 0.92 (1.30) -1.97 (1.49) -0.34 (1.06) -0.57 (1.69) -0.30 (1.68)
hs_mo_c_Log2 -0.13 (0.79) -0.58 (1.18) -0.55 (0.77) -0.42 (0.84) -0.17 (0.74) -0.07 (0.95) -0.32 (0.90)
hs_dde_cadj_Log2 3.81 (1.31) 4.01 (1.28) 4.36 (1.24) 5.67 (1.29) 4.26 (0.94) 6.06 (1.41) 4.67 (1.49)
hs_pcb153_cadj_Log2 2.73 (0.63) 3.50 (0.76) 3.66 (0.84) 3.93 (0.85) 4.22 (0.69) 3.03 (0.68) 3.56 (0.90)
hs_pcb170_cadj_Log2 -2.44 (3.33) 0.33 (1.89) 0.41 (2.42) -0.81 (3.58) 1.38 (1.63) -1.38 (3.14) -0.31 (3.00)
hs_dep_cadj_Log2 1.44 (3.30) -0.27 (3.31) -0.15 (3.07) -1.42 (3.25) 0.62 (2.85) 0.66 (2.82) 0.16 (3.21)
hs_pbde153_cadj_Log2 -3.39 (3.79) -5.11 (3.61) -5.05 (3.83) -4.86 (3.78) -2.66 (3.00) -6.71 (3.67) -4.53 (3.83)
hs_pfhxs_c_Log2 -1.48 (1.03) -0.51 (0.83) -1.55 (0.88) -2.69 (1.19) -0.66 (0.76) -2.83 (1.08) -1.57 (1.31)
hs_pfoa_c_Log2 0.86 (0.50) 0.56 (0.53) 0.52 (0.51) 0.42 (0.61) 0.80 (0.43) 0.46 (0.61) 0.61 (0.55)
hs_pfos_c_Log2 0.57 (0.90) 1.64 (0.78) 0.43 (0.97) 0.19 (1.29) 1.67 (0.75) 1.16 (0.88) 0.97 (1.11)
hs_prpa_cadj_Log2 -0.05 (3.69) -2.65 (3.49) 0.69 (3.83) -2.00 (3.98) -3.14 (2.92) -2.22 (3.50) -1.61 (3.82)
hs_mbzp_cadj_Log2 1.60 (1.16) 2.81 (1.19) 2.52 (1.09) 2.81 (1.11) 2.17 (1.11) 2.85 (1.23) 2.44 (1.22)
hs_mibp_cadj_Log2 6.07 (1.02) 5.47 (1.07) 4.88 (0.90) 6.27 (0.87) 4.74 (0.99) 5.63 (0.83) 5.46 (1.11)
hs_mnbp_cadj_Log2 4.74 (0.90) 4.24 (0.86) 3.99 (0.79) 5.47 (0.86) 4.79 (0.89) 4.84 (1.12) 4.68 (1.02)

All Interested Data

outcome_cov <- cbind(covariate_data, outcome_BMI)
outcome_cov <- outcome_cov[, !duplicated(colnames(outcome_cov))]
#the full chemicals list
chemicals_full <- c(
  "hs_as_c_Log2",
  "hs_cd_c_Log2",
  "hs_co_c_Log2",
  "hs_cs_c_Log2",
  "hs_cu_c_Log2",
  "hs_hg_c_Log2",
  "hs_mn_c_Log2",
  "hs_mo_c_Log2",
  "hs_pb_c_Log2",
  "hs_tl_cdich_None",
  "hs_dde_cadj_Log2",
  "hs_ddt_cadj_Log2",
  "hs_hcb_cadj_Log2",
  "hs_pcb118_cadj_Log2",
  "hs_pcb138_cadj_Log2",
  "hs_pcb153_cadj_Log2",
  "hs_pcb170_cadj_Log2",
  "hs_pcb180_cadj_Log2",
  "hs_dep_cadj_Log2",
  "hs_detp_cadj_Log2",
  "hs_dmdtp_cdich_None",
  "hs_dmp_cadj_Log2",
  "hs_dmtp_cadj_Log2",
  "hs_pbde153_cadj_Log2",
  "hs_pbde47_cadj_Log2",
  "hs_pfhxs_c_Log2",
  "hs_pfna_c_Log2",
  "hs_pfoa_c_Log2",
  "hs_pfos_c_Log2",
  "hs_pfunda_c_Log2",
  "hs_bpa_cadj_Log2",
  "hs_bupa_cadj_Log2",
  "hs_etpa_cadj_Log2",
  "hs_mepa_cadj_Log2",
  "hs_oxbe_cadj_Log2",
  "hs_prpa_cadj_Log2",
  "hs_trcs_cadj_Log2",
  "hs_mbzp_cadj_Log2",
  "hs_mecpp_cadj_Log2",
  "hs_mehhp_cadj_Log2",
  "hs_mehp_cadj_Log2",
  "hs_meohp_cadj_Log2",
  "hs_mep_cadj_Log2",
  "hs_mibp_cadj_Log2",
  "hs_mnbp_cadj_Log2",
  "hs_ohminp_cadj_Log2",
  "hs_oxominp_cadj_Log2",
  "hs_cotinine_cdich_None",
  "hs_globalexp2_None"
)

#postnatal diet for child
postnatal_diet <- c(
  "h_bfdur_Ter",
  "hs_bakery_prod_Ter",
  "hs_beverages_Ter",
  "hs_break_cer_Ter",
  "hs_caff_drink_Ter",
  "hs_dairy_Ter",
  "hs_fastfood_Ter",
  "hs_org_food_Ter",
  "hs_proc_meat_Ter",
  "hs_readymade_Ter",
  "hs_total_bread_Ter",
  "hs_total_cereal_Ter",
  "hs_total_fish_Ter",
  "hs_total_fruits_Ter",
  "hs_total_lipids_Ter",
  "hs_total_meat_Ter",
  "hs_total_potatoes_Ter",
  "hs_total_sweets_Ter",
  "hs_total_veg_Ter",
  "hs_total_yog_Ter"
)

chemicals_columns <- c(chemicals_full)
all_chemicals <- exposome %>% dplyr::select(all_of(chemicals_columns))

diet_columns <- c(postnatal_diet)
all_diet <- exposome %>% dplyr::select(all_of(diet_columns))

all_columns <- c(chemicals_full, postnatal_diet)
extracted_exposome <- exposome %>% dplyr::select(all_of(all_columns))

chemicals_outcome_cov <- cbind(outcome_cov, all_chemicals)

diet_outcome_cov <- cbind(outcome_cov, all_diet)

interested_data <- cbind(outcome_cov, extracted_exposome)
head(interested_data)
interested_data_corr <- select_if(interested_data, is.numeric)
cor_matrix <- cor(interested_data_corr, method = "pearson")
cor_matrix <- cor(interested_data_corr, method = "spearman")
cor_matrix <- cor(interested_data_corr, use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black", tl.srt = 90, tl.cex = 0.4)

Comparing Models with and without Covariates

Chemicals Data

Predicting LASSO

#LASSO train/test 70-30
set.seed(101)
train_indices <- sample(seq_len(nrow(chemicals_outcome_cov)), size = floor(0.7 * nrow(interested_data)))
test_indices <- setdiff(seq_len(nrow(chemicals_outcome_cov)), train_indices)

x_train <- as.matrix(chemicals_outcome_cov[train_indices, setdiff(names(chemicals_outcome_cov), "hs_zbmi_who")])
y_train <- chemicals_outcome_cov$hs_zbmi_who[train_indices]

x_test <- as.matrix(chemicals_outcome_cov[test_indices, setdiff(names(chemicals_outcome_cov), "hs_zbmi_who")])
y_test <- chemicals_outcome_cov$hs_zbmi_who[test_indices]

x_train_chemicals_only <- as.matrix(chemicals_outcome_cov[train_indices, chemicals_full])
x_test_chemicals_only <- as.matrix(chemicals_outcome_cov[test_indices, chemicals_full])

fit_without_covariates_train <- cv.glmnet(x_train_chemicals_only, y_train, alpha = 1, family = "gaussian")
fit_without_covariates_test <- predict(fit_without_covariates_train, s = "lambda.min", newx = x_test_chemicals_only)
test_mse_without_covariates <- mean((y_test - fit_without_covariates_test)^2)

plot(fit_without_covariates_train, xvar = "lambda", main = "Coefficients Path (Without Covariates)")

best_lambda <- fit_without_covariates_train$lambda.min  # lambda that minimizes the MSE
coef(fit_without_covariates_train, s = best_lambda)
## 50 x 1 sparse Matrix of class "dgCMatrix"
##                                   s1
## (Intercept)            -4.7797230131
## hs_as_c_Log2            .           
## hs_cd_c_Log2           -0.0238815730
## hs_co_c_Log2           -0.0011670319
## hs_cs_c_Log2            0.0771865955
## hs_cu_c_Log2            0.6071183261
## hs_hg_c_Log2           -0.0075730086
## hs_mn_c_Log2            .           
## hs_mo_c_Log2           -0.0992489424
## hs_pb_c_Log2           -0.0056257448
## hs_tl_cdich_None        .           
## hs_dde_cadj_Log2       -0.0378984008
## hs_ddt_cadj_Log2        .           
## hs_hcb_cadj_Log2        .           
## hs_pcb118_cadj_Log2     .           
## hs_pcb138_cadj_Log2     .           
## hs_pcb153_cadj_Log2    -0.1721262187
## hs_pcb170_cadj_Log2    -0.0557570999
## hs_pcb180_cadj_Log2     .           
## hs_dep_cadj_Log2       -0.0186165147
## hs_detp_cadj_Log2       .           
## hs_dmdtp_cdich_None     .           
## hs_dmp_cadj_Log2        .           
## hs_dmtp_cadj_Log2       .           
## hs_pbde153_cadj_Log2   -0.0357794002
## hs_pbde47_cadj_Log2     .           
## hs_pfhxs_c_Log2        -0.0019079468
## hs_pfna_c_Log2          .           
## hs_pfoa_c_Log2         -0.1360824261
## hs_pfos_c_Log2         -0.0478302901
## hs_pfunda_c_Log2        .           
## hs_bpa_cadj_Log2        .           
## hs_bupa_cadj_Log2       .           
## hs_etpa_cadj_Log2       .           
## hs_mepa_cadj_Log2       .           
## hs_oxbe_cadj_Log2       0.0008622765
## hs_prpa_cadj_Log2       0.0011728557
## hs_trcs_cadj_Log2       .           
## hs_mbzp_cadj_Log2       0.0373221816
## hs_mecpp_cadj_Log2      .           
## hs_mehhp_cadj_Log2      .           
## hs_mehp_cadj_Log2       .           
## hs_meohp_cadj_Log2      .           
## hs_mep_cadj_Log2        .           
## hs_mibp_cadj_Log2      -0.0477304169
## hs_mnbp_cadj_Log2      -0.0036235331
## hs_ohminp_cadj_Log2     .           
## hs_oxominp_cadj_Log2    .           
## hs_cotinine_cdich_None  .           
## hs_globalexp2_None      .
cat("Model without Covariates - Test MSE:", test_mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.231997

Predicting Ridge

# RIDGE
fit_without_covariates_train <- cv.glmnet(x_train_chemicals_only, y_train, alpha = 0, family = "gaussian")
fit_without_covariates_test <- predict(fit_without_covariates_train, s = "lambda.min", newx = x_test_chemicals_only)
test_mse_without_covariates <- mean((y_test - fit_without_covariates_test)^2)

plot(fit_without_covariates_train, xvar = "lambda", main = "Coefficients Path (Without Covariates)")

best_lambda <- fit_without_covariates_train$lambda.min  # lambda that minimizes the MSE
coef(fit_without_covariates_train, s = best_lambda)
## 50 x 1 sparse Matrix of class "dgCMatrix"
##                                   s1
## (Intercept)            -4.469806e+00
## hs_as_c_Log2            6.590433e-03
## hs_cd_c_Log2           -4.093355e-02
## hs_co_c_Log2           -5.049922e-02
## hs_cs_c_Log2            1.230373e-01
## hs_cu_c_Log2            6.078479e-01
## hs_hg_c_Log2           -3.225520e-02
## hs_mn_c_Log2           -3.089195e-02
## hs_mo_c_Log2           -1.068154e-01
## hs_pb_c_Log2           -5.295956e-02
## hs_tl_cdich_None        .           
## hs_dde_cadj_Log2       -4.888006e-02
## hs_ddt_cadj_Log2        4.045085e-03
## hs_hcb_cadj_Log2       -1.857150e-02
## hs_pcb118_cadj_Log2     1.400112e-02
## hs_pcb138_cadj_Log2    -3.614513e-02
## hs_pcb153_cadj_Log2    -1.223407e-01
## hs_pcb170_cadj_Log2    -5.267521e-02
## hs_pcb180_cadj_Log2    -1.074695e-02
## hs_dep_cadj_Log2       -2.548881e-02
## hs_detp_cadj_Log2       8.051621e-03
## hs_dmdtp_cdich_None     .           
## hs_dmp_cadj_Log2       -2.097690e-03
## hs_dmtp_cadj_Log2       7.300567e-05
## hs_pbde153_cadj_Log2   -3.315313e-02
## hs_pbde47_cadj_Log2     5.273953e-03
## hs_pfhxs_c_Log2        -2.966308e-02
## hs_pfna_c_Log2          2.336166e-02
## hs_pfoa_c_Log2         -1.519872e-01
## hs_pfos_c_Log2         -6.495855e-02
## hs_pfunda_c_Log2        1.248503e-02
## hs_bpa_cadj_Log2        3.832688e-04
## hs_bupa_cadj_Log2       6.588467e-03
## hs_etpa_cadj_Log2      -6.098679e-03
## hs_mepa_cadj_Log2      -1.638466e-02
## hs_oxbe_cadj_Log2       1.390524e-02
## hs_prpa_cadj_Log2       1.258510e-02
## hs_trcs_cadj_Log2       2.878805e-03
## hs_mbzp_cadj_Log2       5.550048e-02
## hs_mecpp_cadj_Log2      1.627174e-03
## hs_mehhp_cadj_Log2      2.316991e-02
## hs_mehp_cadj_Log2      -1.662304e-02
## hs_meohp_cadj_Log2      1.137436e-02
## hs_mep_cadj_Log2        3.371106e-03
## hs_mibp_cadj_Log2      -5.391219e-02
## hs_mnbp_cadj_Log2      -4.383016e-02
## hs_ohminp_cadj_Log2    -2.886768e-02
## hs_oxominp_cadj_Log2    2.204660e-02
## hs_cotinine_cdich_None  .           
## hs_globalexp2_None      .
cat("Model without Covariates - Test MSE:", test_mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.188752

Predicting Elastic Net

# ELASTIC NET
fit_without_covariates_train <- cv.glmnet(x_train_chemicals_only, y_train, alpha = 0.5, family = "gaussian")
fit_without_covariates_test <- predict(fit_without_covariates_train, s = "lambda.min", newx = x_test_chemicals_only)
test_mse_without_covariates <- mean((y_test - fit_without_covariates_test)^2)

plot(fit_without_covariates_train, xvar = "lambda", main = "Coefficients Path (Without Covariates)")

best_lambda <- fit_without_covariates_train$lambda.min  # lambda that minimizes the MSE
coef(fit_without_covariates_train, s = best_lambda)
## 50 x 1 sparse Matrix of class "dgCMatrix"
##                                  s1
## (Intercept)            -4.785950188
## hs_as_c_Log2            .          
## hs_cd_c_Log2           -0.025843356
## hs_co_c_Log2           -0.005835867
## hs_cs_c_Log2            0.084715330
## hs_cu_c_Log2            0.607379616
## hs_hg_c_Log2           -0.009800093
## hs_mn_c_Log2            .          
## hs_mo_c_Log2           -0.099724922
## hs_pb_c_Log2           -0.010318890
## hs_tl_cdich_None        .          
## hs_dde_cadj_Log2       -0.039528137
## hs_ddt_cadj_Log2        .          
## hs_hcb_cadj_Log2        .          
## hs_pcb118_cadj_Log2     .          
## hs_pcb138_cadj_Log2     .          
## hs_pcb153_cadj_Log2    -0.169008355
## hs_pcb170_cadj_Log2    -0.055808065
## hs_pcb180_cadj_Log2     .          
## hs_dep_cadj_Log2       -0.019034348
## hs_detp_cadj_Log2       .          
## hs_dmdtp_cdich_None     .          
## hs_dmp_cadj_Log2        .          
## hs_dmtp_cadj_Log2       .          
## hs_pbde153_cadj_Log2   -0.035464586
## hs_pbde47_cadj_Log2     .          
## hs_pfhxs_c_Log2        -0.006816020
## hs_pfna_c_Log2          .          
## hs_pfoa_c_Log2         -0.135997766
## hs_pfos_c_Log2         -0.047692264
## hs_pfunda_c_Log2        .          
## hs_bpa_cadj_Log2        .          
## hs_bupa_cadj_Log2       .          
## hs_etpa_cadj_Log2       .          
## hs_mepa_cadj_Log2       .          
## hs_oxbe_cadj_Log2       0.002529961
## hs_prpa_cadj_Log2       0.001735800
## hs_trcs_cadj_Log2       .          
## hs_mbzp_cadj_Log2       0.040317847
## hs_mecpp_cadj_Log2      .          
## hs_mehhp_cadj_Log2      .          
## hs_mehp_cadj_Log2       .          
## hs_meohp_cadj_Log2      .          
## hs_mep_cadj_Log2        .          
## hs_mibp_cadj_Log2      -0.047892677
## hs_mnbp_cadj_Log2      -0.008483913
## hs_ohminp_cadj_Log2     .          
## hs_oxominp_cadj_Log2    .          
## hs_cotinine_cdich_None  .          
## hs_globalexp2_None      .
cat("Model without Covariates - Test MSE:", test_mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.228805
#selected chemicals that were noted in enet
chemicals_selected <- c(
  "hs_cd_c_Log2",
  "hs_co_c_Log2",
  "hs_cs_c_Log2",
  "hs_cu_c_Log2",
  "hs_hg_c_Log2",
  "hs_mo_c_Log2",
  "hs_pb_c_Log2",
  "hs_dde_cadj_Log2",
  "hs_pcb153_cadj_Log2",
  "hs_pcb170_cadj_Log2",
  "hs_dep_cadj_Log2",
  "hs_detp_cadj_Log2",
  "hs_pbde153_cadj_Log2",
  "hs_pfhxs_c_Log2",
  "hs_pfoa_c_Log2",
  "hs_pfos_c_Log2",
  "hs_mepa_cadj_Log2",
  "hs_oxbe_cadj_Log2",
  "hs_prpa_cadj_Log2",
  "hs_mbzp_cadj_Log2",
  "hs_mibp_cadj_Log2",
  "hs_mnbp_cadj_Log2")

The features for chemicals were selected due to the feature selections of elastic net.

Postnatal Diet Data

Predicting Lasso

# LASSO with train/test
set.seed(101)  
train_indices <- sample(seq_len(nrow(diet_outcome_cov)), size = floor(0.7 * nrow(diet_outcome_cov)))
test_indices <- setdiff(seq_len(nrow(diet_outcome_cov)), train_indices)

diet_data <- diet_outcome_cov[, postnatal_diet]
x_diet_train <- model.matrix(~ . + 0, data = diet_data[train_indices, ])  
x_diet_test <- model.matrix(~ . + 0, data = diet_data[test_indices, ])  

covariates <- diet_outcome_cov[, c("e3_sex_None", "e3_yearbir_None", "h_cohort", "hs_child_age_None")]
x_covariates_train <- model.matrix(~ . + 0, data = covariates[train_indices, ]) 
x_covariates_test <- model.matrix(~ . + 0, data = covariates[test_indices, ])

x_full_train <- cbind(x_diet_train, x_covariates_train)
x_full_test <- cbind(x_diet_test, x_covariates_test)

x_full_train[is.na(x_full_train)] <- 0
x_full_test[is.na(x_full_test)] <- 0
x_diet_train[is.na(x_diet_train)] <- 0
x_diet_test[is.na(x_diet_test)] <- 0

y_train <- as.numeric(diet_outcome_cov$hs_zbmi_who[train_indices])
y_test <- as.numeric(diet_outcome_cov$hs_zbmi_who[test_indices])

# fit models
fit_without_covariates <- cv.glmnet(x_diet_train, y_train, alpha = 1, family = "gaussian")
fit_without_covariates
## 
## Call:  cv.glmnet(x = x_diet_train, y = y_train, alpha = 1, family = "gaussian") 
## 
## Measure: Mean-Squared Error 
## 
##      Lambda Index Measure      SE Nonzero
## min 0.06922     9   1.431 0.06022       5
## 1se 0.14570     1   1.442 0.06160       0
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")

best_lambda <- fit_without_covariates$lambda.min  # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 41 x 1 sparse Matrix of class "dgCMatrix"
##                                         s1
## (Intercept)                     0.53256344
## h_bfdur_Ter(0,10.8]             .         
## h_bfdur_Ter(10.8,34.9]          .         
## h_bfdur_Ter(34.9,Inf]           .         
## hs_bakery_prod_Ter(2,6]         .         
## hs_bakery_prod_Ter(6,Inf]       .         
## hs_beverages_Ter(0.132,1]       .         
## hs_beverages_Ter(1,Inf]         .         
## hs_break_cer_Ter(1.1,5.5]       .         
## hs_break_cer_Ter(5.5,Inf]       .         
## hs_caff_drink_Ter(0.132,Inf]    .         
## hs_dairy_Ter(14.6,25.6]         .         
## hs_dairy_Ter(25.6,Inf]          .         
## hs_fastfood_Ter(0.132,0.5]      .         
## hs_fastfood_Ter(0.5,Inf]        .         
## hs_org_food_Ter(0.132,1]        .         
## hs_org_food_Ter(1,Inf]         -0.13588632
## hs_proc_meat_Ter(1.5,4]         .         
## hs_proc_meat_Ter(4,Inf]         .         
## hs_readymade_Ter(0.132,0.5]     .         
## hs_readymade_Ter(0.5,Inf]       .         
## hs_total_bread_Ter(7,17.5]      .         
## hs_total_bread_Ter(17.5,Inf]    .         
## hs_total_cereal_Ter(14.1,23.6]  .         
## hs_total_cereal_Ter(23.6,Inf]   .         
## hs_total_fish_Ter(1.5,3]        .         
## hs_total_fish_Ter(3,Inf]        .         
## hs_total_fruits_Ter(7,14.1]     .         
## hs_total_fruits_Ter(14.1,Inf]  -0.02481964
## hs_total_lipids_Ter(3,7]        .         
## hs_total_lipids_Ter(7,Inf]     -0.05164312
## hs_total_meat_Ter(6,9]          .         
## hs_total_meat_Ter(9,Inf]        .         
## hs_total_potatoes_Ter(3,4]      .         
## hs_total_potatoes_Ter(4,Inf]    .         
## hs_total_sweets_Ter(4.1,8.5]   -0.01594403
## hs_total_sweets_Ter(8.5,Inf]    .         
## hs_total_veg_Ter(6,8.5]         .         
## hs_total_veg_Ter(8.5,Inf]      -0.08180563
## hs_total_yog_Ter(6,8.5]         .         
## hs_total_yog_Ter(8.5,Inf]       .
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_diet_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)

cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.34942

Predicting Ridge

# RIDGE
fit_without_covariates <- cv.glmnet(x_diet_train, y_train, alpha = 0, family = "gaussian")
fit_without_covariates
## 
## Call:  cv.glmnet(x = x_diet_train, y = y_train, alpha = 0, family = "gaussian") 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Index Measure      SE Nonzero
## min   3.53    41   1.431 0.08497      40
## 1se 145.70     1   1.441 0.08233      40
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")

best_lambda <- fit_without_covariates$lambda.min  # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 41 x 1 sparse Matrix of class "dgCMatrix"
##                                           s1
## (Intercept)                     0.5163069457
## h_bfdur_Ter(0,10.8]            -0.0114164662
## h_bfdur_Ter(10.8,34.9]          0.0353770607
## h_bfdur_Ter(34.9,Inf]          -0.0138651651
## hs_bakery_prod_Ter(2,6]         0.0228606785
## hs_bakery_prod_Ter(6,Inf]      -0.0268639952
## hs_beverages_Ter(0.132,1]      -0.0065939314
## hs_beverages_Ter(1,Inf]        -0.0016124215
## hs_break_cer_Ter(1.1,5.5]      -0.0034207548
## hs_break_cer_Ter(5.5,Inf]      -0.0337182186
## hs_caff_drink_Ter(0.132,Inf]   -0.0143879393
## hs_dairy_Ter(14.6,25.6]         0.0355023507
## hs_dairy_Ter(25.6,Inf]         -0.0005581647
## hs_fastfood_Ter(0.132,0.5]      0.0161761119
## hs_fastfood_Ter(0.5,Inf]       -0.0001750742
## hs_org_food_Ter(0.132,1]        0.0151677373
## hs_org_food_Ter(1,Inf]         -0.0682466785
## hs_proc_meat_Ter(1.5,4]         0.0222199344
## hs_proc_meat_Ter(4,Inf]        -0.0187135643
## hs_readymade_Ter(0.132,0.5]    -0.0013536008
## hs_readymade_Ter(0.5,Inf]       0.0105115509
## hs_total_bread_Ter(7,17.5]     -0.0035702530
## hs_total_bread_Ter(17.5,Inf]   -0.0070550360
## hs_total_cereal_Ter(14.1,23.6]  0.0082269928
## hs_total_cereal_Ter(23.6,Inf]  -0.0131001584
## hs_total_fish_Ter(1.5,3]       -0.0346609367
## hs_total_fish_Ter(3,Inf]       -0.0051749487
## hs_total_fruits_Ter(7,14.1]     0.0266413533
## hs_total_fruits_Ter(14.1,Inf]  -0.0389551124
## hs_total_lipids_Ter(3,7]       -0.0022752284
## hs_total_lipids_Ter(7,Inf]     -0.0476627593
## hs_total_meat_Ter(6,9]          0.0007524275
## hs_total_meat_Ter(9,Inf]        0.0005196923
## hs_total_potatoes_Ter(3,4]      0.0105526823
## hs_total_potatoes_Ter(4,Inf]    0.0048180175
## hs_total_sweets_Ter(4.1,8.5]   -0.0392140671
## hs_total_sweets_Ter(8.5,Inf]   -0.0010028529
## hs_total_veg_Ter(6,8.5]         0.0009962184
## hs_total_veg_Ter(8.5,Inf]      -0.0556956882
## hs_total_yog_Ter(6,8.5]        -0.0102351610
## hs_total_yog_Ter(8.5,Inf]      -0.0089303177
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_diet_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)

cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.326308

Predicting Elastic Net

#ELASTIC NET
fit_without_covariates <- cv.glmnet(x_diet_train, y_train, alpha = 0.5, family = "gaussian")
fit_without_covariates
## 
## Call:  cv.glmnet(x = x_diet_train, y = y_train, alpha = 0.5, family = "gaussian") 
## 
## Measure: Mean-Squared Error 
## 
##      Lambda Index Measure      SE Nonzero
## min 0.07218    16   1.430 0.05641      12
## 1se 0.29139     1   1.444 0.05877       0
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")

best_lambda <- fit_without_covariates$lambda.min  # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 41 x 1 sparse Matrix of class "dgCMatrix"
##                                          s1
## (Intercept)                     0.650606526
## h_bfdur_Ter(0,10.8]             .          
## h_bfdur_Ter(10.8,34.9]          0.039832328
## h_bfdur_Ter(34.9,Inf]           .          
## hs_bakery_prod_Ter(2,6]         .          
## hs_bakery_prod_Ter(6,Inf]      -0.052635590
## hs_beverages_Ter(0.132,1]       .          
## hs_beverages_Ter(1,Inf]         .          
## hs_break_cer_Ter(1.1,5.5]       .          
## hs_break_cer_Ter(5.5,Inf]      -0.054788470
## hs_caff_drink_Ter(0.132,Inf]    .          
## hs_dairy_Ter(14.6,25.6]         0.053455833
## hs_dairy_Ter(25.6,Inf]          .          
## hs_fastfood_Ter(0.132,0.5]      .          
## hs_fastfood_Ter(0.5,Inf]        .          
## hs_org_food_Ter(0.132,1]        .          
## hs_org_food_Ter(1,Inf]         -0.185235916
## hs_proc_meat_Ter(1.5,4]         0.008558872
## hs_proc_meat_Ter(4,Inf]         .          
## hs_readymade_Ter(0.132,0.5]     .          
## hs_readymade_Ter(0.5,Inf]       .          
## hs_total_bread_Ter(7,17.5]      .          
## hs_total_bread_Ter(17.5,Inf]    .          
## hs_total_cereal_Ter(14.1,23.6]  .          
## hs_total_cereal_Ter(23.6,Inf]   .          
## hs_total_fish_Ter(1.5,3]       -0.057540803
## hs_total_fish_Ter(3,Inf]        .          
## hs_total_fruits_Ter(7,14.1]     0.017171763
## hs_total_fruits_Ter(14.1,Inf]  -0.054914989
## hs_total_lipids_Ter(3,7]        .          
## hs_total_lipids_Ter(7,Inf]     -0.094342286
## hs_total_meat_Ter(6,9]          .          
## hs_total_meat_Ter(9,Inf]        .          
## hs_total_potatoes_Ter(3,4]      .          
## hs_total_potatoes_Ter(4,Inf]    .          
## hs_total_sweets_Ter(4.1,8.5]   -0.089860153
## hs_total_sweets_Ter(8.5,Inf]    .          
## hs_total_veg_Ter(6,8.5]         .          
## hs_total_veg_Ter(8.5,Inf]      -0.118161721
## hs_total_yog_Ter(6,8.5]         .          
## hs_total_yog_Ter(8.5,Inf]       .
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_diet_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)

cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.335144

Combined Data (Chemicals & Postnatal Diet)

Predicting Lasso

set.seed(101)
train_indices <- sample(seq_len(nrow(interested_data)), size = floor(0.7 * nrow(interested_data)))
test_indices <- setdiff(seq_len(nrow(interested_data)), train_indices)

diet_data <- interested_data[, postnatal_diet]
x_diet_train <- model.matrix(~ . + 0, data = diet_data[train_indices, ])
x_diet_test <- model.matrix(~ . + 0, data = diet_data[test_indices, ])

chemical_data <- interested_data[, chemicals_full]
x_chemical_train <- as.matrix(chemical_data[train_indices, ])
x_chemical_test <- as.matrix(chemical_data[test_indices, ])

covariates <- interested_data[, c("e3_sex_None", "e3_yearbir_None", "h_cohort", "hs_child_age_None")]
x_covariates_train <- model.matrix(~ . + 0, data = covariates[train_indices, ])
x_covariates_test <- model.matrix(~ . + 0, data = covariates[test_indices, ])

# combine diet and chemical data with and without covariates
x_combined_train <- cbind(x_diet_train, x_chemical_train)
x_combined_test <- cbind(x_diet_test, x_chemical_test)

x_full_train <- cbind(x_combined_train, x_covariates_train)
x_full_test <- cbind(x_combined_test, x_covariates_test)

# make sure no missing values
x_full_train[is.na(x_full_train)] <- 0
x_full_test[is.na(x_full_test)] <- 0
x_combined_train[is.na(x_combined_train)] <- 0
x_combined_test[is.na(x_combined_test)] <- 0

y_train <- as.numeric(interested_data$hs_zbmi_who[train_indices])
y_test <- as.numeric(interested_data$hs_zbmi_who[test_indices])

# LASSO
fit_without_covariates <- cv.glmnet(x_combined_train, y_train, alpha = 1, family = "gaussian")
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_combined_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)

plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")

best_lambda <- fit_without_covariates$lambda.min  # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 90 x 1 sparse Matrix of class "dgCMatrix"
##                                          s1
## (Intercept)                    -5.016149911
## h_bfdur_Ter(0,10.8]            -0.129594522
## h_bfdur_Ter(10.8,34.9]          .          
## h_bfdur_Ter(34.9,Inf]           .          
## hs_bakery_prod_Ter(2,6]         .          
## hs_bakery_prod_Ter(6,Inf]      -0.217291423
## hs_beverages_Ter(0.132,1]       .          
## hs_beverages_Ter(1,Inf]         .          
## hs_break_cer_Ter(1.1,5.5]       .          
## hs_break_cer_Ter(5.5,Inf]       .          
## hs_caff_drink_Ter(0.132,Inf]    .          
## hs_dairy_Ter(14.6,25.6]         0.009808165
## hs_dairy_Ter(25.6,Inf]          .          
## hs_fastfood_Ter(0.132,0.5]      0.070972556
## hs_fastfood_Ter(0.5,Inf]        .          
## hs_org_food_Ter(0.132,1]        .          
## hs_org_food_Ter(1,Inf]          .          
## hs_proc_meat_Ter(1.5,4]         .          
## hs_proc_meat_Ter(4,Inf]         .          
## hs_readymade_Ter(0.132,0.5]     .          
## hs_readymade_Ter(0.5,Inf]       0.011160944
## hs_total_bread_Ter(7,17.5]     -0.010168208
## hs_total_bread_Ter(17.5,Inf]    .          
## hs_total_cereal_Ter(14.1,23.6]  .          
## hs_total_cereal_Ter(23.6,Inf]   .          
## hs_total_fish_Ter(1.5,3]       -0.024288530
## hs_total_fish_Ter(3,Inf]        .          
## hs_total_fruits_Ter(7,14.1]     .          
## hs_total_fruits_Ter(14.1,Inf]  -0.016129393
## hs_total_lipids_Ter(3,7]        .          
## hs_total_lipids_Ter(7,Inf]     -0.047350302
## hs_total_meat_Ter(6,9]          .          
## hs_total_meat_Ter(9,Inf]        .          
## hs_total_potatoes_Ter(3,4]      0.018317955
## hs_total_potatoes_Ter(4,Inf]    .          
## hs_total_sweets_Ter(4.1,8.5]   -0.006515994
## hs_total_sweets_Ter(8.5,Inf]    .          
## hs_total_veg_Ter(6,8.5]         .          
## hs_total_veg_Ter(8.5,Inf]      -0.041036632
## hs_total_yog_Ter(6,8.5]         .          
## hs_total_yog_Ter(8.5,Inf]       .          
## hs_as_c_Log2                    .          
## hs_cd_c_Log2                   -0.022337287
## hs_co_c_Log2                   -0.003616434
## hs_cs_c_Log2                    0.070483114
## hs_cu_c_Log2                    0.656568320
## hs_hg_c_Log2                   -0.012267249
## hs_mn_c_Log2                    .          
## hs_mo_c_Log2                   -0.097496432
## hs_pb_c_Log2                    .          
## hs_tl_cdich_None                .          
## hs_dde_cadj_Log2               -0.029771276
## hs_ddt_cadj_Log2                .          
## hs_hcb_cadj_Log2                .          
## hs_pcb118_cadj_Log2             .          
## hs_pcb138_cadj_Log2             .          
## hs_pcb153_cadj_Log2            -0.226942147
## hs_pcb170_cadj_Log2            -0.054403335
## hs_pcb180_cadj_Log2             .          
## hs_dep_cadj_Log2               -0.017878387
## hs_detp_cadj_Log2               .          
## hs_dmdtp_cdich_None             .          
## hs_dmp_cadj_Log2                .          
## hs_dmtp_cadj_Log2               .          
## hs_pbde153_cadj_Log2           -0.035568595
## hs_pbde47_cadj_Log2             .          
## hs_pfhxs_c_Log2                 .          
## hs_pfna_c_Log2                  .          
## hs_pfoa_c_Log2                 -0.125219198
## hs_pfos_c_Log2                 -0.047655946
## hs_pfunda_c_Log2                .          
## hs_bpa_cadj_Log2                .          
## hs_bupa_cadj_Log2               .          
## hs_etpa_cadj_Log2               .          
## hs_mepa_cadj_Log2               .          
## hs_oxbe_cadj_Log2               .          
## hs_prpa_cadj_Log2               .          
## hs_trcs_cadj_Log2               .          
## hs_mbzp_cadj_Log2               0.043689764
## hs_mecpp_cadj_Log2              .          
## hs_mehhp_cadj_Log2              .          
## hs_mehp_cadj_Log2               .          
## hs_meohp_cadj_Log2              .          
## hs_mep_cadj_Log2                .          
## hs_mibp_cadj_Log2              -0.040902710
## hs_mnbp_cadj_Log2              -0.007173325
## hs_ohminp_cadj_Log2             .          
## hs_oxominp_cadj_Log2            .          
## hs_cotinine_cdich_None          .          
## hs_globalexp2_None              .
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.200253

Predicting Ridge

# RIDGE
fit_without_covariates <- cv.glmnet(x_combined_train, y_train, alpha = 0, family = "gaussian")
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_combined_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)

plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")

best_lambda <- fit_without_covariates$lambda.min  # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 90 x 1 sparse Matrix of class "dgCMatrix"
##                                           s1
## (Intercept)                    -3.7486876482
## h_bfdur_Ter(0,10.8]            -0.0862270481
## h_bfdur_Ter(10.8,34.9]          0.0187498222
## h_bfdur_Ter(34.9,Inf]           0.0718972907
## hs_bakery_prod_Ter(2,6]        -0.0033853186
## hs_bakery_prod_Ter(6,Inf]      -0.1580980396
## hs_beverages_Ter(0.132,1]       0.0052318976
## hs_beverages_Ter(1,Inf]        -0.0339118523
## hs_break_cer_Ter(1.1,5.5]       0.0042988311
## hs_break_cer_Ter(5.5,Inf]      -0.0503391950
## hs_caff_drink_Ter(0.132,Inf]    0.0156001183
## hs_dairy_Ter(14.6,25.6]         0.0416574408
## hs_dairy_Ter(25.6,Inf]         -0.0174860568
## hs_fastfood_Ter(0.132,0.5]      0.0650667870
## hs_fastfood_Ter(0.5,Inf]       -0.0300919849
## hs_org_food_Ter(0.132,1]        0.0284491409
## hs_org_food_Ter(1,Inf]         -0.0490021669
## hs_proc_meat_Ter(1.5,4]         0.0055207383
## hs_proc_meat_Ter(4,Inf]        -0.0063080789
## hs_readymade_Ter(0.132,0.5]     0.0307292842
## hs_readymade_Ter(0.5,Inf]       0.0632539981
## hs_total_bread_Ter(7,17.5]     -0.0544944827
## hs_total_bread_Ter(17.5,Inf]    0.0146129335
## hs_total_cereal_Ter(14.1,23.6] -0.0004875292
## hs_total_cereal_Ter(23.6,Inf]   0.0180167268
## hs_total_fish_Ter(1.5,3]       -0.0683250014
## hs_total_fish_Ter(3,Inf]        0.0112125503
## hs_total_fruits_Ter(7,14.1]     0.0353241028
## hs_total_fruits_Ter(14.1,Inf]  -0.0433100932
## hs_total_lipids_Ter(3,7]       -0.0171427895
## hs_total_lipids_Ter(7,Inf]     -0.0848619938
## hs_total_meat_Ter(6,9]          0.0172861408
## hs_total_meat_Ter(9,Inf]        0.0044053472
## hs_total_potatoes_Ter(3,4]      0.0536415284
## hs_total_potatoes_Ter(4,Inf]   -0.0115575388
## hs_total_sweets_Ter(4.1,8.5]   -0.0692484887
## hs_total_sweets_Ter(8.5,Inf]   -0.0097071229
## hs_total_veg_Ter(6,8.5]         0.0031586461
## hs_total_veg_Ter(8.5,Inf]      -0.0567605211
## hs_total_yog_Ter(6,8.5]        -0.0245534422
## hs_total_yog_Ter(8.5,Inf]      -0.0386998840
## hs_as_c_Log2                    0.0050439215
## hs_cd_c_Log2                   -0.0352737869
## hs_co_c_Log2                   -0.0396473666
## hs_cs_c_Log2                    0.0905666600
## hs_cu_c_Log2                    0.5291861050
## hs_hg_c_Log2                   -0.0253437065
## hs_mn_c_Log2                   -0.0187832842
## hs_mo_c_Log2                   -0.0835328881
## hs_pb_c_Log2                   -0.0275390915
## hs_tl_cdich_None                .           
## hs_dde_cadj_Log2               -0.0366806354
## hs_ddt_cadj_Log2                0.0032185740
## hs_hcb_cadj_Log2               -0.0317509983
## hs_pcb118_cadj_Log2             0.0025521400
## hs_pcb138_cadj_Log2            -0.0518399321
## hs_pcb153_cadj_Log2            -0.1215197442
## hs_pcb170_cadj_Log2            -0.0418593821
## hs_pcb180_cadj_Log2            -0.0225049584
## hs_dep_cadj_Log2               -0.0189572104
## hs_detp_cadj_Log2               0.0059280868
## hs_dmdtp_cdich_None             .           
## hs_dmp_cadj_Log2               -0.0024527279
## hs_dmtp_cadj_Log2               0.0008420662
## hs_pbde153_cadj_Log2           -0.0277474044
## hs_pbde47_cadj_Log2             0.0052481134
## hs_pfhxs_c_Log2                -0.0305593699
## hs_pfna_c_Log2                 -0.0041077407
## hs_pfoa_c_Log2                 -0.1108211867
## hs_pfos_c_Log2                 -0.0475012252
## hs_pfunda_c_Log2                0.0072385180
## hs_bpa_cadj_Log2               -0.0063616978
## hs_bupa_cadj_Log2               0.0036910227
## hs_etpa_cadj_Log2              -0.0049963326
## hs_mepa_cadj_Log2              -0.0096168009
## hs_oxbe_cadj_Log2               0.0101184198
## hs_prpa_cadj_Log2               0.0061492375
## hs_trcs_cadj_Log2               0.0062007460
## hs_mbzp_cadj_Log2               0.0427566149
## hs_mecpp_cadj_Log2              0.0064702943
## hs_mehhp_cadj_Log2              0.0137458333
## hs_mehp_cadj_Log2              -0.0049569930
## hs_meohp_cadj_Log2              0.0093327409
## hs_mep_cadj_Log2                0.0064280760
## hs_mibp_cadj_Log2              -0.0385576131
## hs_mnbp_cadj_Log2              -0.0344895032
## hs_ohminp_cadj_Log2            -0.0210558232
## hs_oxominp_cadj_Log2            0.0113725933
## hs_cotinine_cdich_None          .           
## hs_globalexp2_None              .
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.154844

Predicting Elastic Net

# ELASTIC NET
fit_without_covariates <- cv.glmnet(x_combined_train, y_train, alpha = 0.5, family = "gaussian")
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_combined_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)

plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")

best_lambda <- fit_without_covariates$lambda.min  # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 90 x 1 sparse Matrix of class "dgCMatrix"
##                                           s1
## (Intercept)                    -5.0446731390
## h_bfdur_Ter(0,10.8]            -0.1251508851
## h_bfdur_Ter(10.8,34.9]          .           
## h_bfdur_Ter(34.9,Inf]           0.0089409662
## hs_bakery_prod_Ter(2,6]         .           
## hs_bakery_prod_Ter(6,Inf]      -0.2141179309
## hs_beverages_Ter(0.132,1]       .           
## hs_beverages_Ter(1,Inf]         .           
## hs_break_cer_Ter(1.1,5.5]       .           
## hs_break_cer_Ter(5.5,Inf]       .           
## hs_caff_drink_Ter(0.132,Inf]    .           
## hs_dairy_Ter(14.6,25.6]         0.0175630268
## hs_dairy_Ter(25.6,Inf]          .           
## hs_fastfood_Ter(0.132,0.5]      0.0739162130
## hs_fastfood_Ter(0.5,Inf]        .           
## hs_org_food_Ter(0.132,1]        .           
## hs_org_food_Ter(1,Inf]         -0.0049809964
## hs_proc_meat_Ter(1.5,4]         .           
## hs_proc_meat_Ter(4,Inf]         .           
## hs_readymade_Ter(0.132,0.5]     .           
## hs_readymade_Ter(0.5,Inf]       0.0170253867
## hs_total_bread_Ter(7,17.5]     -0.0178078486
## hs_total_bread_Ter(17.5,Inf]    .           
## hs_total_cereal_Ter(14.1,23.6]  .           
## hs_total_cereal_Ter(23.6,Inf]   .           
## hs_total_fish_Ter(1.5,3]       -0.0311197485
## hs_total_fish_Ter(3,Inf]        .           
## hs_total_fruits_Ter(7,14.1]     0.0058224545
## hs_total_fruits_Ter(14.1,Inf]  -0.0180115810
## hs_total_lipids_Ter(3,7]        .           
## hs_total_lipids_Ter(7,Inf]     -0.0529611086
## hs_total_meat_Ter(6,9]          .           
## hs_total_meat_Ter(9,Inf]        .           
## hs_total_potatoes_Ter(3,4]      0.0233163117
## hs_total_potatoes_Ter(4,Inf]    .           
## hs_total_sweets_Ter(4.1,8.5]   -0.0128007469
## hs_total_sweets_Ter(8.5,Inf]    .           
## hs_total_veg_Ter(6,8.5]         .           
## hs_total_veg_Ter(8.5,Inf]      -0.0436127333
## hs_total_yog_Ter(6,8.5]         .           
## hs_total_yog_Ter(8.5,Inf]       .           
## hs_as_c_Log2                    .           
## hs_cd_c_Log2                   -0.0242086233
## hs_co_c_Log2                   -0.0084586207
## hs_cs_c_Log2                    0.0772668783
## hs_cu_c_Log2                    0.6571106900
## hs_hg_c_Log2                   -0.0143619887
## hs_mn_c_Log2                    .           
## hs_mo_c_Log2                   -0.0974389913
## hs_pb_c_Log2                    .           
## hs_tl_cdich_None                .           
## hs_dde_cadj_Log2               -0.0321212412
## hs_ddt_cadj_Log2                .           
## hs_hcb_cadj_Log2                .           
## hs_pcb118_cadj_Log2             .           
## hs_pcb138_cadj_Log2             .           
## hs_pcb153_cadj_Log2            -0.2221589832
## hs_pcb170_cadj_Log2            -0.0546331904
## hs_pcb180_cadj_Log2             .           
## hs_dep_cadj_Log2               -0.0179999867
## hs_detp_cadj_Log2               .           
## hs_dmdtp_cdich_None             .           
## hs_dmp_cadj_Log2                .           
## hs_dmtp_cadj_Log2               .           
## hs_pbde153_cadj_Log2           -0.0351341084
## hs_pbde47_cadj_Log2             .           
## hs_pfhxs_c_Log2                -0.0055363055
## hs_pfna_c_Log2                  .           
## hs_pfoa_c_Log2                 -0.1254532888
## hs_pfos_c_Log2                 -0.0469893259
## hs_pfunda_c_Log2                .           
## hs_bpa_cadj_Log2                .           
## hs_bupa_cadj_Log2               .           
## hs_etpa_cadj_Log2               .           
## hs_mepa_cadj_Log2               .           
## hs_oxbe_cadj_Log2               .           
## hs_prpa_cadj_Log2               0.0001965683
## hs_trcs_cadj_Log2               .           
## hs_mbzp_cadj_Log2               0.0457827093
## hs_mecpp_cadj_Log2              .           
## hs_mehhp_cadj_Log2              .           
## hs_mehp_cadj_Log2               .           
## hs_meohp_cadj_Log2              .           
## hs_mep_cadj_Log2                .           
## hs_mibp_cadj_Log2              -0.0415220843
## hs_mnbp_cadj_Log2              -0.0111086286
## hs_ohminp_cadj_Log2             .           
## hs_oxominp_cadj_Log2            .           
## hs_cotinine_cdich_None          .           
## hs_globalexp2_None              .
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.198308

Finalized Data

Selected data based on the enet features without covariates.

Still trying to decide if to stick with continuous or dichotomous outcome (for sensitivity/specificity). Froze the covariates in lasso, ridge and enet to avoid shrinkage.

#selected chemicals that were noted in enet
chemicals_selected <- c(
  "hs_cd_c_Log2",
  "hs_co_c_Log2",
  "hs_cs_c_Log2",
  "hs_cu_c_Log2",
  "hs_hg_c_Log2",
  "hs_mo_c_Log2",
  "hs_pb_c_Log2",
  "hs_dde_cadj_Log2",
  "hs_pcb153_cadj_Log2",
  "hs_pcb170_cadj_Log2",
  "hs_dep_cadj_Log2",
  "hs_detp_cadj_Log2",
  "hs_pbde153_cadj_Log2",
  "hs_pfhxs_c_Log2",
  "hs_pfoa_c_Log2",
  "hs_pfos_c_Log2",
  "hs_mepa_cadj_Log2",
  "hs_oxbe_cadj_Log2",
  "hs_prpa_cadj_Log2",
  "hs_mbzp_cadj_Log2",
  "hs_mibp_cadj_Log2",
  "hs_mnbp_cadj_Log2")
#selected diets that were noted in enet
diet_selected <- c(
  "h_bfdur_Ter",
  "hs_bakery_prod_Ter",
  "hs_break_cer_Ter",
  "hs_dairy_Ter",
  "hs_fastfood_Ter",
  "hs_org_food_Ter",
  "hs_proc_meat_Ter",
  "hs_total_fish_Ter",
  "hs_total_fruits_Ter",
  "hs_total_lipids_Ter",
  "hs_total_sweets_Ter",
  "hs_total_veg_Ter"
)
combined_data_selected <- c(
  "h_bfdur_Ter",
  "hs_bakery_prod_Ter",
  "hs_dairy_Ter",
  "hs_fastfood_Ter",
  "hs_org_food_Ter",
  "hs_readymade_Ter",
  "hs_total_bread_Ter",
  "hs_total_fish_Ter",
  "hs_total_fruits_Ter",
  "hs_total_lipids_Ter",
  "hs_total_potatoes_Ter",
  "hs_total_sweets_Ter",
  "hs_total_veg_Ter",
  "hs_cd_c_Log2",
  "hs_co_c_Log2",
  "hs_cs_c_Log2",
  "hs_cu_c_Log2",
  "hs_hg_c_Log2",
  "hs_mo_c_Log2",
  "hs_pb_c_Log2",
  "hs_dde_cadj_Log2",
  "hs_pcb153_cadj_Log2",
  "hs_pcb170_cadj_Log2",
  "hs_dep_cadj_Log2",
  "hs_pbde153_cadj_Log2",
  "hs_pfhxs_c_Log2",
  "hs_pfoa_c_Log2",
  "hs_pfos_c_Log2",
  "hs_prpa_cadj_Log2",
  "hs_mbzp_cadj_Log2",
  "hs_mibp_cadj_Log2",
  "hs_mnbp_cadj_Log2"
)

outcome_cov <- cbind(covariate_data, outcome_BMI)
outcome_cov <- outcome_cov[, !duplicated(colnames(outcome_cov))]

finalized_columns <- c(combined_data_selected)
final_selected_data <- exposome %>% dplyr::select(all_of(finalized_columns))

finalized_data <- cbind(outcome_cov, final_selected_data)
head(finalized_data)
numeric_finalized <- finalized_data %>%
  dplyr::select(where(is.numeric))

cor_matrix <- cor(numeric_finalized, use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black", tl.srt = 90, tl.cex = 0.6)

find_highly_correlated <- function(cor_matrix, threshold = 0.8) {
  cor_matrix[lower.tri(cor_matrix, diag = TRUE)] <- NA  
  cor_matrix <- as.data.frame(as.table(cor_matrix)) 
  cor_matrix <- na.omit(cor_matrix)  
  cor_matrix <- cor_matrix[order(-abs(cor_matrix$Freq)), ]  
  cor_matrix <- cor_matrix %>% filter(abs(Freq) > threshold)  
  return(cor_matrix)
}

highly_correlated_pairs <- find_highly_correlated(cor_matrix, threshold = 0.50)
highly_correlated_pairs
set.seed(101)

# Splitting data into training and test sets
train_indices <- sample(seq_len(nrow(finalized_data)), size = floor(0.7 * nrow(finalized_data)))
test_indices <- setdiff(seq_len(nrow(finalized_data)), train_indices)

# Creating training and test datasets
train_data <- finalized_data[train_indices, ]
test_data <- finalized_data[test_indices, ]

# Separating predictors and outcome variable
x_train <- model.matrix(~ . + 0, data = train_data[ , !names(train_data) %in% "hs_zbmi_who"])
x_test <- model.matrix(~ . + 0, data = test_data[ , !names(test_data) %in% "hs_zbmi_who"])
y_train <- train_data$hs_zbmi_who
y_test <- test_data$hs_zbmi_who

covariates_selected <- c("hs_child_age_None", "h_cohort", "e3_sex_None", "e3_yearbir_None")

#to freeze the covariates and make sure they are not shrinked
penalty_factors <- rep(1, ncol(x_train))
penalty_factors[colnames(x_train) %in% covariates_selected] <- 0

LASSO

fit_lasso <- cv.glmnet(x_train, y_train, alpha = 1, family = "gaussian", penalty.factor = penalty_factors)
plot(fit_lasso, xvar = "lambda", main = "Coefficients Path")

best_lambda <- fit_lasso$lambda.min
coef(fit_lasso, s = best_lambda)
## 60 x 1 sparse Matrix of class "dgCMatrix"
##                                          s1
## (Intercept)                   -5.6985353717
## e3_sex_Nonefemale             -0.1668238621
## e3_sex_Nonemale                0.0002261185
## e3_yearbir_None2004           -0.0841376400
## e3_yearbir_None2005            0.0375948697
## e3_yearbir_None2006            .           
## e3_yearbir_None2007            .           
## e3_yearbir_None2008            .           
## e3_yearbir_None2009            .           
## h_cohort2                      .           
## h_cohort3                      0.5067254029
## h_cohort4                      0.4249059943
## h_cohort5                      .           
## h_cohort6                      0.2388316662
## hs_child_age_None             -0.0350128312
## h_bfdur_Ter(10.8,34.9]         0.0097153402
## h_bfdur_Ter(34.9,Inf]          0.2012940076
## hs_bakery_prod_Ter(2,6]       -0.0866997757
## hs_bakery_prod_Ter(6,Inf]     -0.2942452878
## hs_dairy_Ter(14.6,25.6]        0.0351375998
## hs_dairy_Ter(25.6,Inf]         .           
## hs_fastfood_Ter(0.132,0.5]     0.0918917850
## hs_fastfood_Ter(0.5,Inf]       .           
## hs_org_food_Ter(0.132,1]       0.0191025517
## hs_org_food_Ter(1,Inf]         .           
## hs_readymade_Ter(0.132,0.5]    .           
## hs_readymade_Ter(0.5,Inf]      0.0598690491
## hs_total_bread_Ter(7,17.5]    -0.0902452806
## hs_total_bread_Ter(17.5,Inf]   0.0119131519
## hs_total_fish_Ter(1.5,3]      -0.0124330667
## hs_total_fish_Ter(3,Inf]       .           
## hs_total_fruits_Ter(7,14.1]    0.0166481440
## hs_total_fruits_Ter(14.1,Inf] -0.0090292007
## hs_total_lipids_Ter(3,7]       .           
## hs_total_lipids_Ter(7,Inf]    -0.0327942748
## hs_total_potatoes_Ter(3,4]     0.0228659156
## hs_total_potatoes_Ter(4,Inf]   .           
## hs_total_sweets_Ter(4.1,8.5]  -0.0496090089
## hs_total_sweets_Ter(8.5,Inf]   .           
## hs_total_veg_Ter(6,8.5]        .           
## hs_total_veg_Ter(8.5,Inf]     -0.0080840054
## hs_cd_c_Log2                  -0.0207348833
## hs_co_c_Log2                  -0.0224205898
## hs_cs_c_Log2                   0.2276908962
## hs_cu_c_Log2                   0.7827340672
## hs_hg_c_Log2                  -0.0284818916
## hs_mo_c_Log2                  -0.1097010089
## hs_pb_c_Log2                  -0.0499334790
## hs_dde_cadj_Log2              -0.0662267011
## hs_pcb153_cadj_Log2           -0.3067102430
## hs_pcb170_cadj_Log2           -0.0602662156
## hs_dep_cadj_Log2              -0.0182989275
## hs_pbde153_cadj_Log2          -0.0309014532
## hs_pfhxs_c_Log2                .           
## hs_pfoa_c_Log2                -0.1328690220
## hs_pfos_c_Log2                 .           
## hs_prpa_cadj_Log2              .           
## hs_mbzp_cadj_Log2              0.0678235403
## hs_mibp_cadj_Log2             -0.0484645526
## hs_mnbp_cadj_Log2             -0.0203392623
predictions_lasso <- predict(fit_lasso, s = "lambda.min", newx = x_test)
mse_lasso <- mean((y_test - predictions_lasso)^2)
rmse_lasso <- sqrt(mse_lasso)

cat("Lasso Test MSE:", mse_lasso, "\n")
## Lasso Test MSE: 1.167525
cat("Lasso Test RMSE:", rmse_lasso, "\n")
## Lasso Test RMSE: 1.080521

Without the metabolomics data, it performs worse than the tree-based methods but slightly better than the decision tree model. The MSE indicates that it may not be capturing all the important predictors effectively.

Ridge

fit_ridge <- cv.glmnet(x_train, y_train, alpha = 0, family = "gaussian", penalty.factor = penalty_factors)
plot(fit_ridge, xvar = "lambda", main = "Coefficients Path")

best_lambda <- fit_ridge$lambda.min
coef(fit_ridge, s = best_lambda)
## 60 x 1 sparse Matrix of class "dgCMatrix"
##                                         s1
## (Intercept)                   -5.360717779
## e3_sex_Nonefemale             -0.088295918
## e3_sex_Nonemale                0.087991394
## e3_yearbir_None2004           -0.107373556
## e3_yearbir_None2005            0.062193699
## e3_yearbir_None2006           -0.019676639
## e3_yearbir_None2007            0.012254970
## e3_yearbir_None2008            0.011816768
## e3_yearbir_None2009            0.010953642
## h_cohort2                     -0.099039963
## h_cohort3                      0.319886958
## h_cohort4                      0.300745433
## h_cohort5                     -0.015972130
## h_cohort6                      0.195651696
## hs_child_age_None             -0.024588059
## h_bfdur_Ter(10.8,34.9]         0.047067568
## h_bfdur_Ter(34.9,Inf]          0.172900630
## hs_bakery_prod_Ter(2,6]       -0.078954077
## hs_bakery_prod_Ter(6,Inf]     -0.241871322
## hs_dairy_Ter(14.6,25.6]        0.065608410
## hs_dairy_Ter(25.6,Inf]         0.006126693
## hs_fastfood_Ter(0.132,0.5]     0.083362702
## hs_fastfood_Ter(0.5,Inf]      -0.020315338
## hs_org_food_Ter(0.132,1]       0.023475751
## hs_org_food_Ter(1,Inf]        -0.038322331
## hs_readymade_Ter(0.132,0.5]    0.043666397
## hs_readymade_Ter(0.5,Inf]      0.093291779
## hs_total_bread_Ter(7,17.5]    -0.093536237
## hs_total_bread_Ter(17.5,Inf]   0.028853864
## hs_total_fish_Ter(1.5,3]      -0.060082360
## hs_total_fish_Ter(3,Inf]      -0.020745375
## hs_total_fruits_Ter(7,14.1]    0.034090146
## hs_total_fruits_Ter(14.1,Inf] -0.031883486
## hs_total_lipids_Ter(3,7]      -0.011407376
## hs_total_lipids_Ter(7,Inf]    -0.062444313
## hs_total_potatoes_Ter(3,4]     0.036961034
## hs_total_potatoes_Ter(4,Inf]  -0.001355249
## hs_total_sweets_Ter(4.1,8.5]  -0.076087171
## hs_total_sweets_Ter(8.5,Inf]  -0.006805973
## hs_total_veg_Ter(6,8.5]        0.011144295
## hs_total_veg_Ter(8.5,Inf]     -0.029351482
## hs_cd_c_Log2                  -0.034602763
## hs_co_c_Log2                  -0.043154869
## hs_cs_c_Log2                   0.207751735
## hs_cu_c_Log2                   0.715889120
## hs_hg_c_Log2                  -0.028460365
## hs_mo_c_Log2                  -0.106176424
## hs_pb_c_Log2                  -0.044991194
## hs_dde_cadj_Log2              -0.069092487
## hs_pcb153_cadj_Log2           -0.245035050
## hs_pcb170_cadj_Log2           -0.059060813
## hs_dep_cadj_Log2              -0.019198325
## hs_pbde153_cadj_Log2          -0.031317144
## hs_pfhxs_c_Log2               -0.001324490
## hs_pfoa_c_Log2                -0.146318397
## hs_pfos_c_Log2                -0.017736195
## hs_prpa_cadj_Log2              0.001510970
## hs_mbzp_cadj_Log2              0.068119528
## hs_mibp_cadj_Log2             -0.043525595
## hs_mnbp_cadj_Log2             -0.036785984
predictions_ridge <- predict(fit_ridge, s = "lambda.min", newx = x_test)
mse_ridge <- mean((y_test - predictions_ridge)^2)
rmse_ridge <- sqrt(mse_ridge)

cat("Ridge Test MSE:", mse_ridge, "\n")
## Ridge Test MSE: 1.158747
cat("Ridge Test RMSE:", rmse_ridge, "\n")
## Ridge Test RMSE: 1.076451

It performs similarly to Lasso regression but slightly worse in terms of MSE. This indicates that both methods are somewhat comparable in this context.

Elastic Net

fit_enet <- cv.glmnet(x_train, y_train, alpha = 0.5, family = "gaussian", penalty.factor = penalty_factors)
plot(fit_enet, xvar = "lambda", main = "Coefficients Path")

best_lambda <- fit_enet$lambda.min
coef(fit_enet, s = best_lambda)
## 60 x 1 sparse Matrix of class "dgCMatrix"
##                                         s1
## (Intercept)                   -5.762303040
## e3_sex_Nonefemale             -0.090651726
## e3_sex_Nonemale                0.077693065
## e3_yearbir_None2004           -0.088295376
## e3_yearbir_None2005            0.041355992
## e3_yearbir_None2006            .          
## e3_yearbir_None2007            .          
## e3_yearbir_None2008            .          
## e3_yearbir_None2009            .          
## h_cohort2                      .          
## h_cohort3                      0.494074994
## h_cohort4                      0.417214046
## h_cohort5                      .          
## h_cohort6                      0.237642922
## hs_child_age_None             -0.036131041
## h_bfdur_Ter(10.8,34.9]         0.014440657
## h_bfdur_Ter(34.9,Inf]          0.201063547
## hs_bakery_prod_Ter(2,6]       -0.088057490
## hs_bakery_prod_Ter(6,Inf]     -0.292160787
## hs_dairy_Ter(14.6,25.6]        0.038354060
## hs_dairy_Ter(25.6,Inf]         .          
## hs_fastfood_Ter(0.132,0.5]     0.093085875
## hs_fastfood_Ter(0.5,Inf]       .          
## hs_org_food_Ter(0.132,1]       0.020436077
## hs_org_food_Ter(1,Inf]        -0.001805079
## hs_readymade_Ter(0.132,0.5]    .          
## hs_readymade_Ter(0.5,Inf]      0.061533749
## hs_total_bread_Ter(7,17.5]    -0.091614236
## hs_total_bread_Ter(17.5,Inf]   0.013358251
## hs_total_fish_Ter(1.5,3]      -0.016884704
## hs_total_fish_Ter(3,Inf]       .          
## hs_total_fruits_Ter(7,14.1]    0.018443884
## hs_total_fruits_Ter(14.1,Inf] -0.011197518
## hs_total_lipids_Ter(3,7]       .          
## hs_total_lipids_Ter(7,Inf]    -0.035710715
## hs_total_potatoes_Ter(3,4]     0.023967123
## hs_total_potatoes_Ter(4,Inf]   .          
## hs_total_sweets_Ter(4.1,8.5]  -0.052069828
## hs_total_sweets_Ter(8.5,Inf]   .          
## hs_total_veg_Ter(6,8.5]        .          
## hs_total_veg_Ter(8.5,Inf]     -0.010627479
## hs_cd_c_Log2                  -0.022129337
## hs_co_c_Log2                  -0.024812845
## hs_cs_c_Log2                   0.226941434
## hs_cu_c_Log2                   0.781031155
## hs_hg_c_Log2                  -0.029052978
## hs_mo_c_Log2                  -0.109882410
## hs_pb_c_Log2                  -0.049780550
## hs_dde_cadj_Log2              -0.067106045
## hs_pcb153_cadj_Log2           -0.301987294
## hs_pcb170_cadj_Log2           -0.060537374
## hs_dep_cadj_Log2              -0.018471711
## hs_pbde153_cadj_Log2          -0.030949640
## hs_pfhxs_c_Log2                .          
## hs_pfoa_c_Log2                -0.135485296
## hs_pfos_c_Log2                 .          
## hs_prpa_cadj_Log2              .          
## hs_mbzp_cadj_Log2              0.068431006
## hs_mibp_cadj_Log2             -0.048154725
## hs_mnbp_cadj_Log2             -0.022366199
predictions_enet <- predict(fit_enet, s = "lambda.min", newx = x_test)
mse_enet <- mean((y_test - predictions_enet)^2)
rmse_enet <- sqrt(mse_enet)

cat("Elastic Net Test MSE:", mse_enet, "\n")
## Elastic Net Test MSE: 1.167095
cat("Elastic Net Test RMSE:", rmse_enet, "\n")
## Elastic Net Test RMSE: 1.080322

Its performance is very close to Lasso regression, indicating that the combination of penalties did not provide a significant advantage without the metabolomics data.

Decision Trees

set.seed(101)
fit_tree_model <- rpart(hs_zbmi_who ~ ., data = train_data, method = "anova")
rpart.plot(fit_tree_model)

fit_tree_predictions <- predict(fit_tree_model, newdata = test_data)
fit_tree_mse <- mean((fit_tree_predictions - y_test)^2)
cat("Decision Tree Mean Squared Error on Test Set:", fit_tree_mse, "\n")
## Decision Tree Mean Squared Error on Test Set: 1.266187

Decision tree model shows a high MSE indicating poor prediction accuracy. Decision trees tend to overfit the training data and may not generalize well to unseen data.

Random Forest

rf_model <- randomForest(x_train, y_train, ntree=500, importance=TRUE)
predictions_rf <- predict(rf_model, x_test)
mse_rf <- mean((y_test - predictions_rf)^2)
rmse_rf <- sqrt(mse_rf)

cat("Random Forest Test MSE:", mse_rf, "\n")
## Random Forest Test MSE: 1.145196
cat("Random Forest Test RMSE:", rmse_rf, "\n")
## Random Forest Test RMSE: 1.070138
par(mfrow = c(1, 1), mar = c(5, 4, 4, 2) + 0.1)
varImpPlot(rf_model)

Random forest improves over the decision tree model by aggregating multiple trees to reduce overfitting and improve prediction accuracy. It shows a lower MSE, indicating better performance and robustness.

GBM

gbm_model <- gbm(hs_zbmi_who ~ ., data = train_data, 
                 distribution = "gaussian",
                 n.trees = 1000,
                 interaction.depth = 3,
                 n.minobsinnode = 10,
                 shrinkage = 0.01,
                 cv.folds = 5,
                 verbose = TRUE)
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.4339             nan     0.0100    0.0024
##      2        1.4296             nan     0.0100    0.0031
##      3        1.4248             nan     0.0100    0.0032
##      4        1.4202             nan     0.0100    0.0039
##      5        1.4167             nan     0.0100    0.0029
##      6        1.4127             nan     0.0100    0.0033
##      7        1.4091             nan     0.0100    0.0033
##      8        1.4049             nan     0.0100    0.0031
##      9        1.4013             nan     0.0100    0.0034
##     10        1.3976             nan     0.0100    0.0028
##     20        1.3628             nan     0.0100    0.0027
##     40        1.3026             nan     0.0100    0.0030
##     60        1.2540             nan     0.0100    0.0023
##     80        1.2120             nan     0.0100    0.0012
##    100        1.1768             nan     0.0100    0.0010
##    120        1.1455             nan     0.0100    0.0002
##    140        1.1179             nan     0.0100    0.0001
##    160        1.0935             nan     0.0100    0.0001
##    180        1.0719             nan     0.0100    0.0005
##    200        1.0509             nan     0.0100    0.0005
##    220        1.0317             nan     0.0100   -0.0001
##    240        1.0134             nan     0.0100   -0.0004
##    260        0.9970             nan     0.0100    0.0000
##    280        0.9815             nan     0.0100    0.0005
##    300        0.9666             nan     0.0100   -0.0001
##    320        0.9537             nan     0.0100   -0.0001
##    340        0.9408             nan     0.0100    0.0001
##    360        0.9285             nan     0.0100    0.0000
##    380        0.9170             nan     0.0100   -0.0004
##    400        0.9065             nan     0.0100   -0.0001
##    420        0.8960             nan     0.0100   -0.0003
##    440        0.8858             nan     0.0100    0.0001
##    460        0.8767             nan     0.0100   -0.0002
##    480        0.8672             nan     0.0100   -0.0001
##    500        0.8577             nan     0.0100   -0.0002
##    520        0.8487             nan     0.0100   -0.0002
##    540        0.8389             nan     0.0100   -0.0002
##    560        0.8307             nan     0.0100   -0.0000
##    580        0.8226             nan     0.0100   -0.0004
##    600        0.8156             nan     0.0100   -0.0005
##    620        0.8077             nan     0.0100   -0.0002
##    640        0.8012             nan     0.0100   -0.0002
##    660        0.7937             nan     0.0100   -0.0002
##    680        0.7858             nan     0.0100    0.0001
##    700        0.7794             nan     0.0100   -0.0001
##    720        0.7728             nan     0.0100   -0.0001
##    740        0.7661             nan     0.0100   -0.0002
##    760        0.7606             nan     0.0100   -0.0004
##    780        0.7548             nan     0.0100   -0.0001
##    800        0.7495             nan     0.0100   -0.0005
##    820        0.7433             nan     0.0100   -0.0004
##    840        0.7378             nan     0.0100   -0.0002
##    860        0.7324             nan     0.0100   -0.0003
##    880        0.7262             nan     0.0100   -0.0000
##    900        0.7205             nan     0.0100    0.0000
##    920        0.7141             nan     0.0100   -0.0002
##    940        0.7095             nan     0.0100   -0.0003
##    960        0.7040             nan     0.0100   -0.0001
##    980        0.6989             nan     0.0100   -0.0004
##   1000        0.6940             nan     0.0100   -0.0003
# finding the best number of trees based on cross-validation
best_trees <- gbm.perf(gbm_model, method = "cv")

predictions_gbm <- predict(gbm_model, test_data, n.trees = best_trees)
mse_gbm <- mean((y_test - predictions_gbm)^2)
rmse_gbm <- sqrt(mse_gbm)

cat("GBM Test MSE:", mse_gbm, "\n")
## GBM Test MSE: 1.115685
cat("GBM Test RMSE:", rmse_gbm, "\n")
## GBM Test RMSE: 1.05626
summary(gbm_model)

GBM performs better than both decision tree and random forest models. It iteratively improves the model by focusing on the errors of previous trees, leading to better prediction accuracy and lower MSE.

Cross-Validation

control <- trainControl(method = "cv", number = 5)

# lasso with cross-validation
fit_lasso_cv <- train(x_train, y_train, method = "glmnet", trControl = control, tuneGrid = expand.grid(alpha = 1, lambda = fit_lasso$lambda.min), penalty.factor = penalty_factors)
print(fit_lasso_cv)
## glmnet 
## 
## 910 samples
##  59 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 727, 728, 728, 729, 728 
## Resampling results:
## 
##   RMSE      Rsquared   MAE      
##   1.062217  0.2188353  0.8389388
## 
## Tuning parameter 'alpha' was held constant at a value of 1
## Tuning
##  parameter 'lambda' was held constant at a value of 0.01635015
# ridge with cross-validation
fit_ridge_cv <- train(x_train, y_train, method = "glmnet", trControl = control, tuneGrid = expand.grid(alpha = 0, lambda = fit_ridge$lambda.min), penalty.factor = penalty_factors)
print(fit_ridge_cv)
## glmnet 
## 
## 910 samples
##  59 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 727, 728, 729, 728, 728 
## Resampling results:
## 
##   RMSE      Rsquared   MAE      
##   1.068294  0.2095173  0.8417059
## 
## Tuning parameter 'alpha' was held constant at a value of 0
## Tuning
##  parameter 'lambda' was held constant at a value of 0.2485077
# enet with cross-validation
fit_enet_cv <- train(x_train, y_train, method = "glmnet", trControl = control, tuneGrid = expand.grid(alpha = 0.5, lambda = fit_enet$lambda.min), penalty.factor = penalty_factors)
print(fit_enet_cv)
## glmnet 
## 
## 910 samples
##  59 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 727, 729, 727, 729, 728 
## Resampling results:
## 
##   RMSE      Rsquared   MAE      
##   1.066526  0.2136246  0.8426497
## 
## Tuning parameter 'alpha' was held constant at a value of 0.5
## Tuning
##  parameter 'lambda' was held constant at a value of 0.02979529
# decision tree with cross-validation
fit_tree_cv <- train(hs_zbmi_who ~ ., data = train_data, method = "rpart", trControl = control, tuneLength = 10)
print(fit_tree_cv)
## CART 
## 
## 910 samples
##  36 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 728, 727, 728, 729, 728 
## Resampling results across tuning parameters:
## 
##   cp           RMSE      Rsquared    MAE      
##   0.009952924  1.201210  0.09490860  0.9384751
##   0.010291878  1.192271  0.10115107  0.9293562
##   0.011008069  1.188957  0.09961121  0.9246674
##   0.011424452  1.182492  0.10395709  0.9190344
##   0.012325313  1.168467  0.10932984  0.9133457
##   0.016846553  1.153119  0.10289876  0.9029847
##   0.031002516  1.167910  0.07142021  0.9174089
##   0.034048114  1.173460  0.06383089  0.9194178
##   0.040866421  1.171104  0.06548200  0.9197945
##   0.075544119  1.182232  0.04422955  0.9371431
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.01684655.
rpart.plot(fit_tree_cv$finalModel)

fit_tree_predictions <- predict(fit_tree_cv, newdata = test_data)
fit_tree_mse <- mean((fit_tree_predictions - y_test)^2)
cat("Decision Tree Mean Squared Error on Test Set:", fit_tree_mse, "\n")
## Decision Tree Mean Squared Error on Test Set: 1.278299
# random forest with cross-validation
rf_cv <- train(x_train, y_train, method = "rf", trControl = control)
print(rf_cv)
## Random Forest 
## 
## 910 samples
##  59 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 729, 727, 728, 727, 729 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE      
##    2    1.117039  0.1722827  0.8821770
##   30    1.083520  0.1865641  0.8580197
##   59    1.089618  0.1750660  0.8599570
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 30.
# GBM with cross-validation
gbm_cv <- train(hs_zbmi_who ~ ., data = train_data, method = "gbm", trControl = control, verbose = FALSE)
print(gbm_cv)
## Stochastic Gradient Boosting 
## 
## 910 samples
##  36 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 728, 728, 728, 728, 728 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  RMSE      Rsquared   MAE      
##   1                   50      1.095546  0.1700913  0.8703459
##   1                  100      1.079824  0.1912650  0.8627509
##   1                  150      1.081858  0.1910741  0.8612897
##   2                   50      1.083798  0.1891214  0.8587307
##   2                  100      1.069339  0.2121326  0.8482374
##   2                  150      1.075526  0.2075468  0.8518063
##   3                   50      1.079273  0.1927452  0.8514070
##   3                  100      1.080161  0.1989293  0.8574333
##   3                  150      1.090696  0.1902409  0.8665017
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 100, interaction.depth =
##  2, shrinkage = 0.1 and n.minobsinnode = 10.

With Metabolomic Serum Data

First 10 rows and columns of the metabolomic serum data

load("/Users/allison/Library/CloudStorage/GoogleDrive-aflouie@usc.edu/My Drive/HELIX_data/metabol_serum.RData")
metabol_serum_transposed <- as.data.frame(t(metabol_serum.d))
metabol_serum_transposed$ID <- as.integer(rownames(metabol_serum_transposed))

# Add the ID column to the first position
metabol_serum_transposed <- metabol_serum_transposed[, c("ID", setdiff(names(metabol_serum_transposed), "ID"))]

# Now, the ID is the first column, and the layout is preserved
kable(head(metabol_serum_transposed), align = "c", digits = 2, format = "pipe")
ID metab_1 metab_2 metab_3 metab_4 metab_5 metab_6 metab_7 metab_8 metab_9 metab_10 metab_11 metab_12 metab_13 metab_14 metab_15 metab_16 metab_17 metab_18 metab_19 metab_20 metab_21 metab_22 metab_23 metab_24 metab_25 metab_26 metab_27 metab_28 metab_29 metab_30 metab_31 metab_32 metab_33 metab_34 metab_35 metab_36 metab_37 metab_38 metab_39 metab_40 metab_41 metab_42 metab_43 metab_44 metab_45 metab_46 metab_47 metab_48 metab_49 metab_50 metab_51 metab_52 metab_53 metab_54 metab_55 metab_56 metab_57 metab_58 metab_59 metab_60 metab_61 metab_62 metab_63 metab_64 metab_65 metab_66 metab_67 metab_68 metab_69 metab_70 metab_71 metab_72 metab_73 metab_74 metab_75 metab_76 metab_77 metab_78 metab_79 metab_80 metab_81 metab_82 metab_83 metab_84 metab_85 metab_86 metab_87 metab_88 metab_89 metab_90 metab_91 metab_92 metab_93 metab_94 metab_95 metab_96 metab_97 metab_98 metab_99 metab_100 metab_101 metab_102 metab_103 metab_104 metab_105 metab_106 metab_107 metab_108 metab_109 metab_110 metab_111 metab_112 metab_113 metab_114 metab_115 metab_116 metab_117 metab_118 metab_119 metab_120 metab_121 metab_122 metab_123 metab_124 metab_125 metab_126 metab_127 metab_128 metab_129 metab_130 metab_131 metab_132 metab_133 metab_134 metab_135 metab_136 metab_137 metab_138 metab_139 metab_140 metab_141 metab_142 metab_143 metab_144 metab_145 metab_146 metab_147 metab_148 metab_149 metab_150 metab_151 metab_152 metab_153 metab_154 metab_155 metab_156 metab_157 metab_158 metab_159 metab_160 metab_161 metab_162 metab_163 metab_164 metab_165 metab_166 metab_167 metab_168 metab_169 metab_170 metab_171 metab_172 metab_173 metab_174 metab_175 metab_176 metab_177
430 430 -2.15 -0.71 8.60 0.55 7.05 5.79 3.75 5.07 -1.87 -2.77 -3.31 -2.91 -2.94 -1.82 -4.40 -4.10 -5.41 -5.13 -5.35 -3.39 -5.08 -6.06 -6.06 -4.99 -4.46 -4.63 -3.27 -4.61 2.17 -1.73 -4.97 -4.90 -2.63 -5.29 -2.38 -4.06 -5.11 -5.35 -4.80 -3.92 -3.92 -5.47 -4.22 -2.56 -3.93 5.15 6.03 10.20 5.14 7.82 12.31 7.27 7.08 1.79 7.73 7.98 1.96 6.15 0.98 0.60 4.42 4.36 5.85 1.03 2.74 -2.53 -2.05 -2.91 -1.61 -1.63 5.03 0.14 6.23 -2.95 1.29 1.70 -2.83 4.55 4.05 2.56 -0.29 8.33 9.93 4.89 1.28 2.16 5.82 8.95 7.72 8.41 4.71 0.10 2.02 0.16 5.82 7.45 6.17 6.81 -0.70 -1.25 -0.65 2.05 3.39 4.94 -0.69 -1.44 -2.06 -2.44 -1.30 -0.73 -1.52 -2.43 -3.26 1.97 0.03 1.09 3.98 4.56 4.16 0.42 3.48 4.88 3.84 4.70 4.04 1.58 -0.76 1.75 2.48 4.43 4.68 3.29 0.97 1.03 0.44 1.55 2.26 2.72 0.12 -0.90 -0.50 0.02 -0.18 1.02 -2.69 -1.66 0.47 0.28 6.75 7.67 -2.66 -1.52 7.28 -0.08 2.39 1.55 3.01 2.92 -0.48 6.78 3.90 4.05 3.17 -1.46 3.56 4.60 -3.55 -2.79 -1.98 -1.84 3.98 6.47 7.16 -0.01 6.57 6.86 8.36
1187 1187 -0.69 -0.37 9.15 -1.33 6.89 5.81 4.26 5.08 -2.30 -3.42 -3.63 -3.16 -3.22 -1.57 -4.10 -5.35 -5.68 -6.11 -5.54 -3.50 -5.24 -5.72 -5.97 -4.94 -4.25 -4.46 -3.55 -4.64 1.81 -2.92 -4.44 -4.49 -3.53 -4.94 -3.15 -4.13 -4.47 -4.90 -4.24 -3.49 -3.94 -4.99 -4.02 -2.69 -3.69 5.13 5.57 9.93 6.13 8.47 12.32 6.83 5.94 1.64 6.82 7.74 1.98 6.11 0.99 0.19 4.34 4.36 5.47 0.92 2.69 -2.69 -1.93 -2.79 -1.63 -1.69 4.58 0.41 6.14 -3.06 1.05 2.10 -2.95 4.51 4.30 2.57 0.08 8.27 9.54 4.61 1.39 1.91 5.91 8.59 7.34 8.04 4.29 -0.04 2.17 0.42 5.39 6.95 5.68 6.09 -0.68 -1.29 -0.76 1.84 3.06 4.40 -0.52 -1.52 -1.90 -2.44 -1.46 -1.00 -1.33 -2.41 -3.67 2.48 0.27 1.02 4.19 4.43 4.19 0.33 3.24 4.38 3.92 5.09 4.42 1.01 -0.53 1.36 2.25 4.54 5.10 3.45 0.65 0.83 0.36 1.68 2.56 2.70 0.02 -1.02 -0.93 -0.22 0.11 1.60 -2.70 -1.31 1.08 0.54 6.29 7.97 -3.22 -1.34 7.50 0.48 2.19 1.49 3.09 2.71 -0.38 6.86 3.77 4.31 3.23 -1.82 3.80 5.05 -3.31 -2.18 -2.21 -2.01 4.91 6.84 7.14 0.14 6.03 6.55 7.91
940 940 -0.69 -0.36 8.95 -0.13 7.10 5.86 4.35 5.92 -1.97 -3.40 -3.41 -2.99 -3.01 -1.65 -3.55 -4.82 -5.41 -5.84 -5.13 -2.83 -4.86 -5.51 -5.51 -4.63 -3.73 -4.00 -2.92 -4.21 2.79 -1.41 -4.80 -5.47 -2.10 -5.47 -2.14 -4.18 -4.84 -5.24 -4.64 -3.20 -3.90 -5.24 -3.77 -2.70 -2.76 5.21 5.86 9.78 6.38 8.29 12.49 7.01 6.49 1.97 7.17 7.62 2.40 6.93 1.85 1.45 5.11 5.30 6.27 2.35 3.31 -2.50 -1.41 -2.61 -0.93 -1.03 4.54 1.59 6.03 -2.74 1.79 2.68 -8.16 5.19 5.14 3.16 0.24 9.09 10.25 5.44 1.90 2.46 6.66 9.19 8.24 8.46 5.73 1.10 2.58 1.15 6.37 7.28 6.51 7.20 -0.48 -0.69 -0.02 2.56 3.76 5.33 -0.16 -1.18 -1.18 -2.16 -1.06 -0.19 -0.48 -2.35 -3.16 2.79 0.72 2.14 4.80 4.84 4.55 1.27 4.26 5.23 4.40 5.43 4.56 2.32 0.03 2.15 3.22 5.06 5.28 3.80 1.38 1.58 0.98 2.27 2.94 3.39 0.33 -0.53 0.17 0.53 0.57 1.69 -2.21 -0.76 1.25 0.49 6.49 8.84 -4.02 -1.33 7.42 0.71 2.81 2.03 3.30 3.00 -0.24 7.02 3.82 4.66 3.36 -1.18 3.82 4.91 -2.95 -2.89 -2.43 -2.05 4.25 7.02 7.36 0.14 6.57 6.68 8.12
936 936 -0.19 -0.34 8.54 -0.62 7.01 5.95 4.24 5.41 -1.89 -2.84 -3.38 -3.11 -2.94 -1.45 -3.83 -4.43 -5.61 -5.41 -5.54 -2.94 -4.78 -6.06 -5.88 -4.70 -4.82 -4.46 -2.66 -3.82 2.85 -2.70 -5.16 -5.47 -3.31 -5.61 -2.80 -4.11 -4.97 -4.86 -5.01 -3.63 -3.78 -5.29 -4.17 -2.49 -3.65 5.31 5.60 9.87 6.67 8.05 12.33 6.72 6.42 1.25 7.28 7.37 1.99 6.28 1.17 0.50 4.52 4.43 5.54 1.30 3.08 -2.92 -2.16 -3.18 -1.66 -1.63 4.55 0.53 5.73 -3.27 1.30 1.70 -2.57 4.53 4.14 2.61 -0.18 8.32 9.62 4.82 1.58 1.99 5.82 8.59 7.58 8.39 4.68 0.36 2.01 -0.31 5.71 7.35 6.22 6.66 -0.70 -1.42 -0.62 2.13 3.54 4.85 -0.72 -1.53 -2.04 -2.37 -1.38 -0.96 -1.57 -2.91 -3.60 2.37 0.21 0.92 4.05 4.27 4.33 0.24 3.38 4.45 3.71 4.74 4.44 1.51 -1.73 1.51 2.27 4.37 4.89 3.40 0.66 0.83 0.27 1.50 2.30 2.60 0.14 -0.90 -0.99 -0.53 -0.30 1.14 -3.06 -1.69 0.39 0.19 6.21 8.05 -2.75 -0.87 7.79 0.87 2.48 1.62 3.28 2.93 -0.41 6.91 3.75 4.38 3.20 -1.07 3.81 4.89 -3.36 -2.40 -2.06 -2.03 3.99 7.36 6.94 0.14 6.26 6.47 7.98
788 788 -1.96 -0.35 8.73 -0.80 6.90 5.95 4.88 5.39 -1.55 -2.45 -3.51 -2.84 -2.83 -1.71 -3.91 -4.05 -5.61 -4.63 -5.29 -3.51 -4.86 -5.97 -5.27 -4.90 -4.40 -4.63 -3.11 -3.99 2.87 -2.23 -4.61 -5.04 -3.53 -5.08 -3.02 -4.41 -4.72 -5.18 -4.72 -3.63 -3.61 -5.29 -4.05 -2.31 -3.73 4.69 5.31 9.69 6.76 8.21 12.18 6.75 6.51 1.15 7.38 7.93 1.76 5.68 -0.02 -0.65 4.14 3.36 4.43 0.21 1.98 -2.31 -1.54 -2.30 -1.66 -1.47 4.48 0.88 6.47 -2.50 0.74 1.12 -2.17 4.31 3.50 2.09 -0.60 8.06 9.69 3.99 0.54 1.60 5.60 8.71 7.32 8.03 3.27 -0.98 1.59 -0.20 5.68 7.16 5.57 6.16 -0.79 -1.31 -0.87 2.17 3.23 4.57 -0.93 -1.80 -2.27 -2.51 -1.74 -1.02 -1.92 -2.02 -3.79 1.95 -0.24 0.40 3.73 4.13 3.71 0.03 2.89 4.06 3.54 4.76 3.88 0.53 -2.11 1.27 1.99 4.13 4.58 2.88 0.22 0.39 0.22 1.44 2.02 2.22 0.00 -0.81 -1.10 -0.41 -0.09 1.00 -2.66 -1.55 0.33 0.19 6.47 7.89 -4.40 -1.94 7.65 0.38 1.66 0.84 2.78 2.26 -0.84 6.52 3.53 3.81 2.83 -1.69 3.65 4.47 -3.81 -2.97 -2.88 -2.29 3.88 6.99 7.38 -0.10 6.00 6.52 8.04
698 698 -1.90 -0.63 8.24 -0.46 6.94 5.42 4.70 4.62 -1.78 -3.14 -3.46 -2.90 -2.94 -1.65 -4.20 -4.56 -5.68 -5.61 -5.41 -2.92 -5.04 -5.97 -6.06 -4.90 -4.22 -4.20 -3.05 -4.61 2.15 -2.87 -4.68 -5.08 -3.69 -5.24 -3.63 -4.24 -5.16 -5.35 -4.97 -3.61 -3.99 -5.35 -3.98 -2.59 -3.95 5.15 5.82 10.00 5.54 8.15 12.28 6.80 6.23 1.88 7.07 7.38 2.06 6.79 1.67 1.00 4.79 4.79 5.71 1.99 3.29 -2.13 -1.01 -1.85 -1.23 -0.90 4.41 -0.02 6.09 -2.10 1.66 2.27 -3.48 4.96 4.76 2.64 0.05 8.91 9.99 5.16 1.53 2.11 6.28 8.77 8.03 8.66 5.99 0.87 2.30 0.63 6.23 7.50 6.75 7.22 -0.45 -0.81 -0.11 2.57 3.93 5.16 -0.31 -1.19 -1.25 -1.93 -0.89 0.07 -0.87 -1.12 -3.03 2.61 0.54 1.83 4.50 4.53 4.42 1.15 4.02 4.91 4.06 5.06 4.42 2.02 -1.03 1.87 2.96 4.84 5.08 3.62 1.13 1.23 0.75 2.26 2.80 3.04 0.41 -0.39 0.02 0.31 0.52 1.73 -2.28 -0.73 1.06 0.72 6.44 7.27 -3.08 -1.23 7.35 0.92 2.60 2.00 3.69 3.20 -0.25 7.38 4.15 5.00 3.88 -1.39 4.31 5.20 -3.47 -2.75 -1.97 -1.96 4.18 6.81 6.75 0.02 6.49 5.97 7.78
# ID is the common identifier in both datasets
combined_data <- merge(selected_id_data, metabol_serum_transposed, by = "ID", all = TRUE)

selected_metabolomics_data <- combined_data %>% dplyr::select(-c(ID))
head(selected_metabolomics_data)
#removing any NA, might be problematic but hard to impute completely
selected_metabolomics_data <- selected_metabolomics_data %>% na.omit()

set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who, p = .7, 
                                  list = FALSE, 
                                  times = 1)
train_data <- selected_metabolomics_data[ trainIndex,]
test_data  <- selected_metabolomics_data[-trainIndex,]

x_train <- model.matrix(hs_zbmi_who ~ ., train_data)[,-1]
y_train <- train_data$hs_zbmi_who
x_test <- model.matrix(hs_zbmi_who ~ ., test_data)[,-1]
y_test <- test_data$hs_zbmi_who

#to freeze the covariates and make sure they are not shrinked
penalty_factors <- rep(1, ncol(x_train))
penalty_factors[colnames(x_train) %in% covariates_selected] <- 0

LASSO

lasso_model <- cv.glmnet(x_train, y_train, alpha = 1, family = "gaussian", penalty.factor = penalty_factors)
plot(lasso_model)

lasso_model$lambda.min
## [1] 0.006704838
coef(lasso_model, s = lasso_model$lambda.min)
## 236 x 1 sparse Matrix of class "dgCMatrix"
##                                          s1
## (Intercept)                   10.0873205451
## hs_child_age_None             -0.1775474762
## h_cohort2                     -0.0565985253
## h_cohort3                      0.2580648052
## h_cohort4                      0.2646356571
## h_cohort5                      .           
## h_cohort6                      0.0612509267
## e3_sex_Nonemale                0.3369389026
## e3_yearbir_None2004           -0.1041005789
## e3_yearbir_None2005           -0.0204083598
## e3_yearbir_None2006            0.0376738228
## e3_yearbir_None2007            0.0182175859
## e3_yearbir_None2008           -0.0624341617
## e3_yearbir_None2009            0.3498821133
## hs_cd_c_Log2                   0.0065030732
## hs_co_c_Log2                   .           
## hs_cs_c_Log2                   0.1133407986
## hs_cu_c_Log2                   0.1500790986
## hs_hg_c_Log2                  -0.0487964996
## hs_mo_c_Log2                  -0.0486721256
## hs_pb_c_Log2                   .           
## hs_dde_cadj_Log2              -0.0194811950
## hs_pcb153_cadj_Log2           -0.2143690955
## hs_pcb170_cadj_Log2           -0.0336622173
## hs_dep_cadj_Log2              -0.0104497209
## hs_pbde153_cadj_Log2          -0.0161435652
## hs_pfhxs_c_Log2                .           
## hs_pfoa_c_Log2                -0.0280969333
## hs_pfos_c_Log2                 0.0091062730
## hs_prpa_cadj_Log2             -0.0104495990
## hs_mbzp_cadj_Log2              0.0520375303
## hs_mibp_cadj_Log2              .           
## hs_mnbp_cadj_Log2             -0.0022858607
## h_bfdur_Ter(10.8,34.9]         0.1021939222
## h_bfdur_Ter(34.9,Inf]          0.1190424816
## hs_bakery_prod_Ter(2,6]        0.0124712032
## hs_bakery_prod_Ter(6,Inf]     -0.0932332482
## hs_dairy_Ter(14.6,25.6]       -0.0089891178
## hs_dairy_Ter(25.6,Inf]         0.0669697686
## hs_fastfood_Ter(0.132,0.5]     .           
## hs_fastfood_Ter(0.5,Inf]      -0.0206216746
## hs_org_food_Ter(0.132,1]       0.0467987299
## hs_org_food_Ter(1,Inf]         0.0217619401
## hs_readymade_Ter(0.132,0.5]    0.0785766629
## hs_readymade_Ter(0.5,Inf]      0.0527003752
## hs_total_bread_Ter(7,17.5]     .           
## hs_total_bread_Ter(17.5,Inf]   .           
## hs_total_fish_Ter(1.5,3]       .           
## hs_total_fish_Ter(3,Inf]      -0.0477288225
## hs_total_fruits_Ter(7,14.1]    0.0800526559
## hs_total_fruits_Ter(14.1,Inf]  0.1190703510
## hs_total_lipids_Ter(3,7]       0.0574076564
## hs_total_lipids_Ter(7,Inf]     .           
## hs_total_potatoes_Ter(3,4]     0.0122978901
## hs_total_potatoes_Ter(4,Inf]   .           
## hs_total_sweets_Ter(4.1,8.5]   .           
## hs_total_sweets_Ter(8.5,Inf]   .           
## hs_total_veg_Ter(6,8.5]        0.0311960357
## hs_total_veg_Ter(8.5,Inf]     -0.0003494278
## metab_1                       -0.0207480669
## metab_2                        0.0655176076
## metab_3                        0.0310770895
## metab_4                        0.0071875197
## metab_5                        0.4469691181
## metab_6                       -0.1016058787
## metab_7                        .           
## metab_8                        0.2251765767
## metab_9                        .           
## metab_10                       0.0021975504
## metab_11                       0.1928961474
## metab_12                      -0.1613798130
## metab_13                       .           
## metab_14                      -0.4415512911
## metab_15                       .           
## metab_16                       .           
## metab_17                       .           
## metab_18                      -0.1626661265
## metab_19                       .           
## metab_20                       .           
## metab_21                       0.0071702665
## metab_22                      -0.2548712997
## metab_23                       0.1397488802
## metab_24                       0.6283717502
## metab_25                      -0.1263241999
## metab_26                      -0.2461042658
## metab_27                       0.4955355316
## metab_28                       .           
## metab_29                      -0.0022813079
## metab_30                       0.1334679240
## metab_31                       0.0507349831
## metab_32                      -0.1306230178
## metab_33                       .           
## metab_34                      -0.0063353273
## metab_35                       .           
## metab_36                       .           
## metab_37                      -0.0349061111
## metab_38                      -0.0626106810
## metab_39                       .           
## metab_40                       0.0665059252
## metab_41                       0.2581546498
## metab_42                      -0.4074327373
## metab_43                      -0.1684543533
## metab_44                      -0.0181901550
## metab_45                       0.1260764176
## metab_46                       .           
## metab_47                       0.4838480040
## metab_48                      -0.7729424147
## metab_49                       0.1227097130
## metab_50                      -0.2151548676
## metab_51                       .           
## metab_52                       0.4114766746
## metab_53                       .           
## metab_54                       0.1252237052
## metab_55                       .           
## metab_56                      -0.1087354965
## metab_57                       .           
## metab_58                       .           
## metab_59                       0.6126085024
## metab_60                      -0.1510283973
## metab_61                       .           
## metab_62                       .           
## metab_63                      -0.1550611197
## metab_64                       .           
## metab_65                       .           
## metab_66                      -0.0615020127
## metab_67                      -0.2612115857
## metab_68                       0.1137045330
## metab_69                      -0.0553915023
## metab_70                       .           
## metab_71                      -0.0649183734
## metab_72                       .           
## metab_73                      -0.1152015876
## metab_74                       .           
## metab_75                       0.2830259776
## metab_76                       .           
## metab_77                       0.0131038076
## metab_78                      -0.1346197453
## metab_79                       .           
## metab_80                       .           
## metab_81                       .           
## metab_82                      -0.6376405315
## metab_83                       .           
## metab_84                      -0.1315511746
## metab_85                       .           
## metab_86                       0.3659692306
## metab_87                       0.0367345459
## metab_88                       0.6164520098
## metab_89                      -1.2934194237
## metab_90                       .           
## metab_91                       0.1349662127
## metab_92                       0.1163201230
## metab_93                       .           
## metab_94                      -0.0623490797
## metab_95                       1.6392898687
## metab_96                       .           
## metab_97                       .           
## metab_98                       .           
## metab_99                      -0.4562396279
## metab_100                      0.5892508350
## metab_101                      .           
## metab_102                      .           
## metab_103                     -0.4556440692
## metab_104                      0.1410826206
## metab_105                      0.1488938954
## metab_106                      0.1042063463
## metab_107                      0.0148742853
## metab_108                      .           
## metab_109                     -0.2816625205
## metab_110                     -0.1537915440
## metab_111                      .           
## metab_112                      .           
## metab_113                      0.6583017912
## metab_114                      .           
## metab_115                      0.5283110987
## metab_116                      .           
## metab_117                      .           
## metab_118                     -0.3663558171
## metab_119                      .           
## metab_120                     -0.2542869527
## metab_121                      .           
## metab_122                     -0.0008529339
## metab_123                      .           
## metab_124                      .           
## metab_125                     -0.1832087562
## metab_126                      .           
## metab_127                     -0.0236027086
## metab_128                      .           
## metab_129                      .           
## metab_130                      .           
## metab_131                      .           
## metab_132                      .           
## metab_133                     -0.2792184698
## metab_134                      0.3456520222
## metab_135                     -0.2614018516
## metab_136                      .           
## metab_137                     -0.3669591947
## metab_138                     -0.5147132754
## metab_139                      .           
## metab_140                     -0.0099614052
## metab_141                      .           
## metab_142                     -0.6198306025
## metab_143                     -0.2782851220
## metab_144                      0.0048704545
## metab_145                     -0.1710933254
## metab_146                      .           
## metab_147                      0.5387069379
## metab_148                      .           
## metab_149                      .           
## metab_150                      0.2857765620
## metab_151                     -0.0034005540
## metab_152                     -0.0476288643
## metab_153                      .           
## metab_154                     -0.0189652205
## metab_155                     -0.3388510280
## metab_156                      .           
## metab_157                      0.1747906602
## metab_158                      .           
## metab_159                      0.1307044126
## metab_160                     -2.1519852712
## metab_161                      2.4797093234
## metab_162                      .           
## metab_163                      0.5861067228
## metab_164                     -0.0944162816
## metab_165                      .           
## metab_166                     -0.3770777220
## metab_167                     -0.0865760438
## metab_168                      .           
## metab_169                      .           
## metab_170                      .           
## metab_171                     -0.0766589539
## metab_172                      .           
## metab_173                      0.0717348353
## metab_174                      .           
## metab_175                     -0.2054485955
## metab_176                     -0.0672212135
## metab_177                      0.1100128323
lasso_predictions <- predict(lasso_model, s = lasso_model$lambda.min, newx = x_test)

test_mse <- mean((lasso_predictions - y_test)^2)
cat("Mean Squared Error on Test Set:", test_mse, "\n")
## Mean Squared Error on Test Set: 0.7417664

The optimal value of the regularization parameter selected by cross-validation is approximately 0.0067. The Mean Squared Error (MSE) on the test set was calculated as 0.742, which seems to be quite high to predict better performance.

To assess the robustness of the model, 10-fold cross-validation was performed.

perform_cv <- function(data, response, k = 10) {
  # make k-folds
  folds <- createFolds(data[[response]], k = k, list = TRUE, returnTrain = TRUE)
  
  mse_values <- c()
  
  for (i in 1:k) {
    train_indices <- folds[[i]]
    train_data <- data[train_indices, ]
    test_data <- data[-train_indices, ]
    
    x_train <- model.matrix(as.formula(paste(response, "~ .")), train_data)[, -1]
    y_train <- train_data[[response]]
    x_test <- model.matrix(as.formula(paste(response, "~ .")), test_data)[, -1]
    y_test <- test_data[[response]]
    
    # add LASSO with cv.glmnet
    lasso_model <- cv.glmnet(x_train, y_train, alpha = 1, family = "gaussian", penalty.factor = penalty_factors)
    
    lasso_predictions <- predict(lasso_model, s = lasso_model$lambda.min, newx = x_test)
    
    mse <- mean((lasso_predictions - y_test)^2)
    mse_values <- c(mse_values, mse)
  }
  
  return(mse_values)
}

cv_mse_values <- perform_cv(selected_metabolomics_data, "hs_zbmi_who", k = 10)

# cross-validation results
cat("Cross-Validation Mean Squared Errors:", cv_mse_values, "\n")
## Cross-Validation Mean Squared Errors: 0.7364814 0.7397049 0.5973614 0.6753132 0.6453257 0.6465419 0.7736915 0.7431029 0.6424957 0.6355086
cat("Mean MSE:", mean(cv_mse_values), "\n")
## Mean MSE: 0.6835527
cat("Standard Deviation of MSE:", sd(cv_mse_values), "\n")
## Standard Deviation of MSE: 0.05957963

The average MSE across the folds was 0.684, with a standard deviation of 0.06. These values indicate that the model performs consistently across different subsets of the data, as evidenced by the relatively low standard deviation of the MSE. The average MSE of approximately 0.68 suggests that the model has reasonable predictive accuracy.

Ridge

ridge_model <- cv.glmnet(x_train, y_train, alpha = 0, family = "gaussian", penalty.factor = penalty_factors)
plot(ridge_model)

ridge_model$lambda.min
## [1] 0.1478504
coef(ridge_model, s = ridge_model$lambda.min)
## 236 x 1 sparse Matrix of class "dgCMatrix"
##                                          s1
## (Intercept)                    1.7282177701
## hs_child_age_None             -0.1414592563
## h_cohort2                     -0.0978666452
## h_cohort3                      0.2060658161
## h_cohort4                      0.2452503476
## h_cohort5                      0.0538322470
## h_cohort6                      0.1116108113
## e3_sex_Nonemale                0.2556083925
## e3_yearbir_None2004           -0.1053358742
## e3_yearbir_None2005           -0.0468330017
## e3_yearbir_None2006            0.0443277971
## e3_yearbir_None2007            0.0335364143
## e3_yearbir_None2008           -0.0518178601
## e3_yearbir_None2009            0.4234409077
## hs_cd_c_Log2                   0.0079617361
## hs_co_c_Log2                   0.0067345415
## hs_cs_c_Log2                   0.1160807227
## hs_cu_c_Log2                   0.2008732135
## hs_hg_c_Log2                  -0.0470103142
## hs_mo_c_Log2                  -0.0554079913
## hs_pb_c_Log2                  -0.0147880884
## hs_dde_cadj_Log2              -0.0388878971
## hs_pcb153_cadj_Log2           -0.1818128966
## hs_pcb170_cadj_Log2           -0.0397269342
## hs_dep_cadj_Log2              -0.0113985574
## hs_pbde153_cadj_Log2          -0.0176867599
## hs_pfhxs_c_Log2                0.0070854835
## hs_pfoa_c_Log2                -0.0541885918
## hs_pfos_c_Log2                 0.0073013745
## hs_prpa_cadj_Log2             -0.0090564614
## hs_mbzp_cadj_Log2              0.0494747753
## hs_mibp_cadj_Log2              0.0035633664
## hs_mnbp_cadj_Log2             -0.0237847433
## h_bfdur_Ter(10.8,34.9]         0.1272151954
## h_bfdur_Ter(34.9,Inf]          0.1446408215
## hs_bakery_prod_Ter(2,6]        0.0300821886
## hs_bakery_prod_Ter(6,Inf]     -0.1045958505
## hs_dairy_Ter(14.6,25.6]       -0.0351185562
## hs_dairy_Ter(25.6,Inf]         0.0602100781
## hs_fastfood_Ter(0.132,0.5]    -0.0136301532
## hs_fastfood_Ter(0.5,Inf]      -0.0388752517
## hs_org_food_Ter(0.132,1]       0.0537109988
## hs_org_food_Ter(1,Inf]         0.0302911681
## hs_readymade_Ter(0.132,0.5]    0.1146349170
## hs_readymade_Ter(0.5,Inf]      0.0791774367
## hs_total_bread_Ter(7,17.5]    -0.0180618725
## hs_total_bread_Ter(17.5,Inf]  -0.0093625203
## hs_total_fish_Ter(1.5,3]      -0.0187224833
## hs_total_fish_Ter(3,Inf]      -0.0523869179
## hs_total_fruits_Ter(7,14.1]    0.0991937763
## hs_total_fruits_Ter(14.1,Inf]  0.1264710114
## hs_total_lipids_Ter(3,7]       0.0749322350
## hs_total_lipids_Ter(7,Inf]    -0.0069487824
## hs_total_potatoes_Ter(3,4]     0.0283563351
## hs_total_potatoes_Ter(4,Inf]  -0.0065373154
## hs_total_sweets_Ter(4.1,8.5]  -0.0175070732
## hs_total_sweets_Ter(8.5,Inf]  -0.0016464266
## hs_total_veg_Ter(6,8.5]        0.0424871729
## hs_total_veg_Ter(8.5,Inf]     -0.0115471828
## metab_1                       -0.0341603748
## metab_2                        0.3242593008
## metab_3                        0.1076777830
## metab_4                        0.0221853613
## metab_5                        0.3474616072
## metab_6                       -0.1753759379
## metab_7                        0.0747971195
## metab_8                        0.3630760790
## metab_9                       -0.0486861609
## metab_10                       0.0523991679
## metab_11                       0.1671749775
## metab_12                      -0.1364730801
## metab_13                      -0.0298738292
## metab_14                      -0.5125061837
## metab_15                      -0.0083710729
## metab_16                       0.0305179140
## metab_17                      -0.0191024302
## metab_18                      -0.1450129860
## metab_19                      -0.0813832302
## metab_20                       0.0158096619
## metab_21                       0.2329557622
## metab_22                      -0.2638978096
## metab_23                       0.1378970319
## metab_24                       0.6329125433
## metab_25                      -0.1194619663
## metab_26                      -0.2542476511
## metab_27                       0.3236385699
## metab_28                       0.0610668473
## metab_29                      -0.0766258444
## metab_30                       0.1218806207
## metab_31                       0.0623493326
## metab_32                      -0.1125098637
## metab_33                       0.0065103674
## metab_34                      -0.0483559357
## metab_35                      -0.0141070961
## metab_36                      -0.0561199913
## metab_37                      -0.0956896689
## metab_38                      -0.0635801349
## metab_39                       0.0012247780
## metab_40                       0.3738560597
## metab_41                       0.2655978180
## metab_42                      -0.4203463821
## metab_43                      -0.2045465409
## metab_44                      -0.0940831603
## metab_45                       0.1499580599
## metab_46                      -0.0607099684
## metab_47                       0.4558739707
## metab_48                      -0.5941339477
## metab_49                       0.1444370402
## metab_50                      -0.2750689351
## metab_51                       0.0624718069
## metab_52                       0.4297878903
## metab_53                       0.0410146045
## metab_54                       0.1107534367
## metab_55                       0.0054025799
## metab_56                      -0.1446093111
## metab_57                       0.0940567994
## metab_58                      -0.1390673342
## metab_59                       0.4316589827
## metab_60                      -0.1436043674
## metab_61                       0.0962311832
## metab_62                      -0.0590826805
## metab_63                      -0.1240472953
## metab_64                       0.0615088957
## metab_65                       0.0299361770
## metab_66                      -0.1085502901
## metab_67                      -0.1202622423
## metab_68                       0.0957945397
## metab_69                      -0.0697767416
## metab_70                      -0.0266675354
## metab_71                      -0.1232493184
## metab_72                      -0.0288851430
## metab_73                      -0.0957276198
## metab_74                       0.0225722965
## metab_75                       0.2740076112
## metab_76                      -0.0363207113
## metab_77                       0.0073709132
## metab_78                      -0.2691498337
## metab_79                       0.0358766366
## metab_80                       0.0351731597
## metab_81                       0.1508161004
## metab_82                      -0.3879205270
## metab_83                      -0.1299297970
## metab_84                      -0.1831250285
## metab_85                       0.0020165001
## metab_86                       0.2604223866
## metab_87                       0.1152845933
## metab_88                       0.3415572854
## metab_89                      -0.3066834308
## metab_90                      -0.0532296831
## metab_91                       0.1280527062
## metab_92                       0.1044026773
## metab_93                      -0.0123727721
## metab_94                      -0.0837918551
## metab_95                       0.7271570071
## metab_96                       0.3018352357
## metab_97                      -0.1561866063
## metab_98                      -0.0376285257
## metab_99                      -0.4088325961
## metab_100                      0.3822487443
## metab_101                      0.0946370318
## metab_102                      0.0801208308
## metab_103                     -0.2204341324
## metab_104                      0.1999043304
## metab_105                      0.1284995841
## metab_106                      0.0808988863
## metab_107                      0.1028527821
## metab_108                     -0.0167278852
## metab_109                     -0.2114040747
## metab_110                     -0.2382235667
## metab_111                     -0.0803962957
## metab_112                      0.0634550249
## metab_113                      0.4998148884
## metab_114                      0.0577121726
## metab_115                      0.4277643966
## metab_116                     -0.0005507927
## metab_117                     -0.1871162791
## metab_118                     -0.1477652039
## metab_119                      0.0616771132
## metab_120                     -0.2997882757
## metab_121                      0.0627850814
## metab_122                     -0.2495761638
## metab_123                     -0.1120351659
## metab_124                      0.0356298686
## metab_125                     -0.1740107448
## metab_126                      0.0240784648
## metab_127                     -0.0260325380
## metab_128                     -0.0755103332
## metab_129                      0.0818975104
## metab_130                     -0.1831489109
## metab_131                     -0.0371682604
## metab_132                      0.0402819395
## metab_133                     -0.2564251045
## metab_134                      0.2063246686
## metab_135                     -0.1000596726
## metab_136                     -0.1918288648
## metab_137                     -0.2947151246
## metab_138                     -0.2960643918
## metab_139                     -0.0331791841
## metab_140                     -0.0867055515
## metab_141                     -0.0873270681
## metab_142                     -0.3278292894
## metab_143                     -0.2360060518
## metab_144                      0.1307733498
## metab_145                     -0.2372306365
## metab_146                     -0.0268332225
## metab_147                      0.2922168591
## metab_148                      0.0382837084
## metab_149                      0.0520729402
## metab_150                      0.2476273071
## metab_151                      0.0119159957
## metab_152                     -0.0514125800
## metab_153                     -0.0840271322
## metab_154                     -0.0312170974
## metab_155                     -0.2245886891
## metab_156                     -0.0457999554
## metab_157                      0.1180563546
## metab_158                      0.1518303829
## metab_159                      0.1879475207
## metab_160                     -0.9059458692
## metab_161                      1.3011594090
## metab_162                     -0.0228710208
## metab_163                      0.6973030067
## metab_164                     -0.0970072803
## metab_165                     -0.0459831397
## metab_166                     -0.3388498763
## metab_167                     -0.1053059991
## metab_168                     -0.0612715126
## metab_169                      0.0257426951
## metab_170                     -0.0082334426
## metab_171                     -0.0405670834
## metab_172                     -0.0371720667
## metab_173                      0.1221819645
## metab_174                     -0.1119440302
## metab_175                     -0.1677458160
## metab_176                     -0.0799237505
## metab_177                      0.1414407707
predictions <- predict(ridge_model, s = ridge_model$lambda.min, newx = x_test)

test_mse <- mean((predictions - y_test)^2)
cat("Mean Squared Error on Test Set:", test_mse, "\n")
## Mean Squared Error on Test Set: 0.7391414

The optimal value of the regularization parameter (lambda.min) selected by cross-validation is approximately 0.148.

The test set MSE is quite similar to LASSO, with Ridge performing slightly better.

perform_cv <- function(data, response, k = 10) {
  # make k-folds
  folds <- createFolds(data[[response]], k = k, list = TRUE, returnTrain = TRUE)
  
  mse_values <- c()
  
  for (i in 1:k) {
    train_indices <- folds[[i]]
    train_data <- data[train_indices, ]
    test_data <- data[-train_indices, ]
    
    x_train <- model.matrix(as.formula(paste(response, "~ .")), train_data)[, -1]
    y_train <- train_data[[response]]
    x_test <- model.matrix(as.formula(paste(response, "~ .")), test_data)[, -1]
    y_test <- test_data[[response]]
    
    # add ridge with cv.glmnet
    ridge_model <- cv.glmnet(x_train, y_train, alpha = 0, family = "gaussian", penalty.factor = penalty_factors)
    
    ridge_predictions <- predict(ridge_model, s = ridge_model$lambda.min, newx = x_test)
    
    mse <- mean((ridge_predictions - y_test)^2)
    mse_values <- c(mse_values, mse)
  }
  
  return(mse_values)
}

cv_mse_values <- perform_cv(selected_metabolomics_data, "hs_zbmi_who", k = 10)

# cross-validation results
cat("Cross-Validation Mean Squared Errors:", cv_mse_values, "\n")
## Cross-Validation Mean Squared Errors: 0.5416315 0.7609588 0.7226972 0.5760888 0.658409 0.6398092 0.8116814 0.8872555 0.6814814 0.7203786
cat("Mean MSE:", mean(cv_mse_values), "\n")
## Mean MSE: 0.7000391
cat("Standard Deviation of MSE:", sd(cv_mse_values), "\n")
## Standard Deviation of MSE: 0.1045169

The ridge regression model’s test set MSE of 0.7391414 is slightly higher than the mean MSE from cross-validation (0.7000391), suggesting that the model generalizes reasonably well to new data. The standard deviation of 0.1045169 indicates moderate variability in model performance across different subsets of the data. This suggests that the model is relatively stable but has some sensitivity to the specific data used for training and testing.

Compared to the LASSO model, which had a test set MSE of 0.7417664 and a cross-validation mean MSE of 0.6835527, the ridge regression model has a slightly lower test set MSE and a higher mean MSE from cross-validation. This suggests that while both models perform similarly, the ridge regression model might handle multicollinearity better due to its nature of penalizing all coefficients equally.

Elastic

enet_model <- cv.glmnet(x_train, y_train, alpha = 0.5, family = "gaussian", penalty.factor = penalty_factors)
plot(enet_model)

enet_model$lambda.min
## [1] 0.01113295
coef(enet_model, s = enet_model$lambda.min)
## 236 x 1 sparse Matrix of class "dgCMatrix"
##                                         s1
## (Intercept)                    9.495312982
## hs_child_age_None             -0.175051038
## h_cohort2                     -0.050198059
## h_cohort3                      0.279050436
## h_cohort4                      0.281935768
## h_cohort5                      0.014566913
## h_cohort6                      0.074774181
## e3_sex_Nonemale                0.336945087
## e3_yearbir_None2004           -0.105849441
## e3_yearbir_None2005           -0.032233895
## e3_yearbir_None2006            0.031079227
## e3_yearbir_None2007            0.026477775
## e3_yearbir_None2008           -0.058697269
## e3_yearbir_None2009            0.374806703
## hs_cd_c_Log2                   0.006306628
## hs_co_c_Log2                   .          
## hs_cs_c_Log2                   0.115206826
## hs_cu_c_Log2                   0.158402617
## hs_hg_c_Log2                  -0.050678712
## hs_mo_c_Log2                  -0.048451563
## hs_pb_c_Log2                  -0.001773176
## hs_dde_cadj_Log2              -0.021678293
## hs_pcb153_cadj_Log2           -0.214823246
## hs_pcb170_cadj_Log2           -0.033536167
## hs_dep_cadj_Log2              -0.010667085
## hs_pbde153_cadj_Log2          -0.016078241
## hs_pfhxs_c_Log2                0.003151061
## hs_pfoa_c_Log2                -0.031409856
## hs_pfos_c_Log2                 0.011230231
## hs_prpa_cadj_Log2             -0.010251744
## hs_mbzp_cadj_Log2              0.052812544
## hs_mibp_cadj_Log2              .          
## hs_mnbp_cadj_Log2             -0.004342600
## h_bfdur_Ter(10.8,34.9]         0.101836788
## h_bfdur_Ter(34.9,Inf]          0.124268907
## hs_bakery_prod_Ter(2,6]        0.014718156
## hs_bakery_prod_Ter(6,Inf]     -0.089513468
## hs_dairy_Ter(14.6,25.6]       -0.011569290
## hs_dairy_Ter(25.6,Inf]         0.069685584
## hs_fastfood_Ter(0.132,0.5]     .          
## hs_fastfood_Ter(0.5,Inf]      -0.023076763
## hs_org_food_Ter(0.132,1]       0.049883200
## hs_org_food_Ter(1,Inf]         0.028245012
## hs_readymade_Ter(0.132,0.5]    0.085733495
## hs_readymade_Ter(0.5,Inf]      0.056002388
## hs_total_bread_Ter(7,17.5]     .          
## hs_total_bread_Ter(17.5,Inf]   .          
## hs_total_fish_Ter(1.5,3]       .          
## hs_total_fish_Ter(3,Inf]      -0.052169723
## hs_total_fruits_Ter(7,14.1]    0.087447656
## hs_total_fruits_Ter(14.1,Inf]  0.126392816
## hs_total_lipids_Ter(3,7]       0.061389352
## hs_total_lipids_Ter(7,Inf]     .          
## hs_total_potatoes_Ter(3,4]     0.011362234
## hs_total_potatoes_Ter(4,Inf]   .          
## hs_total_sweets_Ter(4.1,8.5]   .          
## hs_total_sweets_Ter(8.5,Inf]   .          
## hs_total_veg_Ter(6,8.5]        0.028365554
## hs_total_veg_Ter(8.5,Inf]     -0.006869029
## metab_1                       -0.024913517
## metab_2                        0.090613749
## metab_3                        0.039079837
## metab_4                        0.011454360
## metab_5                        0.454126711
## metab_6                       -0.111600339
## metab_7                        0.004497640
## metab_8                        0.236490674
## metab_9                        .          
## metab_10                       0.030645423
## metab_11                       0.196644274
## metab_12                      -0.158918884
## metab_13                       .          
## metab_14                      -0.460893856
## metab_15                       .          
## metab_16                       .          
## metab_17                       .          
## metab_18                      -0.186430838
## metab_19                       .          
## metab_20                       .          
## metab_21                       0.060409783
## metab_22                      -0.275375448
## metab_23                       0.149442881
## metab_24                       0.641889829
## metab_25                      -0.133451466
## metab_26                      -0.242185369
## metab_27                       0.510277342
## metab_28                       .          
## metab_29                      -0.030263084
## metab_30                       0.134219897
## metab_31                       0.057298020
## metab_32                      -0.138411041
## metab_33                       .          
## metab_34                      -0.021686085
## metab_35                       .          
## metab_36                       .          
## metab_37                      -0.029487859
## metab_38                      -0.056351692
## metab_39                       .          
## metab_40                       0.096513093
## metab_41                       0.269779946
## metab_42                      -0.419822483
## metab_43                      -0.178793620
## metab_44                      -0.053973699
## metab_45                       0.138508878
## metab_46                       .          
## metab_47                       0.478431813
## metab_48                      -0.759632270
## metab_49                       0.118570898
## metab_50                      -0.216858557
## metab_51                       .          
## metab_52                       0.439945719
## metab_53                       .          
## metab_54                       0.123900271
## metab_55                       .          
## metab_56                      -0.137801788
## metab_57                       .          
## metab_58                       .          
## metab_59                       0.621236836
## metab_60                      -0.159787847
## metab_61                       .          
## metab_62                       .          
## metab_63                      -0.163980737
## metab_64                       .          
## metab_65                       .          
## metab_66                      -0.091101925
## metab_67                      -0.248961219
## metab_68                       0.140190606
## metab_69                      -0.082507800
## metab_70                       .          
## metab_71                      -0.074575148
## metab_72                       .          
## metab_73                      -0.129126292
## metab_74                       .          
## metab_75                       0.317490644
## metab_76                       .          
## metab_77                       0.013540724
## metab_78                      -0.158348186
## metab_79                       .          
## metab_80                       .          
## metab_81                       .          
## metab_82                      -0.681574449
## metab_83                       .          
## metab_84                      -0.151896358
## metab_85                       .          
## metab_86                       0.408101094
## metab_87                       0.083878162
## metab_88                       0.605059032
## metab_89                      -1.179469580
## metab_90                       .          
## metab_91                       0.141133384
## metab_92                       0.123188890
## metab_93                       .          
## metab_94                      -0.064988424
## metab_95                       1.535490338
## metab_96                       0.027829919
## metab_97                       .          
## metab_98                       .          
## metab_99                      -0.512378916
## metab_100                      0.588689603
## metab_101                      .          
## metab_102                      .          
## metab_103                     -0.430897435
## metab_104                      0.167340158
## metab_105                      0.166647728
## metab_106                      0.127470318
## metab_107                      0.041809372
## metab_108                      .          
## metab_109                     -0.267986163
## metab_110                     -0.179603427
## metab_111                      .          
## metab_112                      .          
## metab_113                      0.675209227
## metab_114                      .          
## metab_115                      0.541169298
## metab_116                      .          
## metab_117                      .          
## metab_118                     -0.343920647
## metab_119                      .          
## metab_120                     -0.270942077
## metab_121                      .          
## metab_122                     -0.032978428
## metab_123                      .          
## metab_124                      .          
## metab_125                     -0.225286580
## metab_126                      .          
## metab_127                     -0.023241158
## metab_128                     -0.002566352
## metab_129                      .          
## metab_130                      .          
## metab_131                      .          
## metab_132                      .          
## metab_133                     -0.303885414
## metab_134                      0.379244640
## metab_135                     -0.259400568
## metab_136                      .          
## metab_137                     -0.407203906
## metab_138                     -0.556456703
## metab_139                      .          
## metab_140                     -0.007072434
## metab_141                      .          
## metab_142                     -0.618376472
## metab_143                     -0.295905059
## metab_144                      0.052802863
## metab_145                     -0.200535780
## metab_146                     -0.016753827
## metab_147                      0.542928496
## metab_148                      .          
## metab_149                      .          
## metab_150                      0.281600203
## metab_151                     -0.005096803
## metab_152                     -0.047714499
## metab_153                      .          
## metab_154                     -0.018143848
## metab_155                     -0.400759607
## metab_156                      .          
## metab_157                      0.202961184
## metab_158                      .          
## metab_159                      0.152834187
## metab_160                     -2.000322927
## metab_161                      2.377151821
## metab_162                      .          
## metab_163                      0.631069072
## metab_164                     -0.105163219
## metab_165                      .          
## metab_166                     -0.430394245
## metab_167                     -0.095824903
## metab_168                     -0.012670429
## metab_169                      .          
## metab_170                      .          
## metab_171                     -0.079141688
## metab_172                      .          
## metab_173                      0.093863160
## metab_174                      .          
## metab_175                     -0.218116361
## metab_176                     -0.078842439
## metab_177                      0.134229019
predictions <- predict(enet_model, s = enet_model$lambda.min, newx = x_test)

test_mse <- mean((predictions - y_test)^2)
cat("Mean Squared Error on Test Set:", test_mse, "\n")
## Mean Squared Error on Test Set: 0.7391896

The MSE on the test set was 0.7391896.

perform_cv <- function(data, response, k = 10) {
  # make k-folds
  folds <- createFolds(data[[response]], k = k, list = TRUE, returnTrain = TRUE)
  
  mse_values <- c()
  
  for (i in 1:k) {
    train_indices <- folds[[i]]
    train_data <- data[train_indices, ]
    test_data <- data[-train_indices, ]
    
    x_train <- model.matrix(as.formula(paste(response, "~ .")), train_data)[, -1]
    y_train <- train_data[[response]]
    x_test <- model.matrix(as.formula(paste(response, "~ .")), test_data)[, -1]
    y_test <- test_data[[response]]
    
    # add enet with cv.glmnet
    enet_model <- cv.glmnet(x_train, y_train, alpha = 0.5, family = "gaussian", penalty.factor = penalty_factors)
    
    enet_predictions <- predict(enet_model, s = ridge_model$lambda.min, newx = x_test)
    
    mse <- mean((enet_predictions - y_test)^2)
    mse_values <- c(mse_values, mse)
  }
  
  return(mse_values)
}

cv_mse_values <- perform_cv(selected_metabolomics_data, "hs_zbmi_who", k = 10)

# cross-validation results
cat("Cross-Validation Mean Squared Errors:", cv_mse_values, "\n")
## Cross-Validation Mean Squared Errors: 0.9191387 0.9944847 1.059095 0.718622 0.9920714 1.024423 0.8691982 0.9626139 0.9308513 0.869777
cat("Mean MSE:", mean(cv_mse_values), "\n")
## Mean MSE: 0.9340275
cat("Standard Deviation of MSE:", sd(cv_mse_values), "\n")
## Standard Deviation of MSE: 0.0981024

The Elastic Net model’s test set MSE of 0.7391896 is slightly lower than the mean MSE from cross-validation (0.9340275), suggesting that the model generalizes reasonably well to new data. The standard deviation of 0.0981024 indicates moderate variability in model performance across different subsets of the data. This suggests that the model is relatively stable but has some sensitivity to the specific data used for training and testing.

The Elastic Net model performs similarly to the LASSO and Ridge models in terms of predictive accuracy, with a slightly lower test set MSE. However, the higher mean MSE from cross-validation suggests that the Elastic Net model may be more sensitive to the data splits.

Decision Trees

set.seed(101)
tree_model <- rpart(hs_zbmi_who ~ ., data = train_data, method = "anova")
rpart.plot(tree_model)

tree_predictions <- predict(tree_model, newdata = test_data)
tree_mse <- mean((tree_predictions - y_test)^2)
cat("Decision Tree Mean Squared Error on Test Set:", tree_mse, "\n")
## Decision Tree Mean Squared Error on Test Set: 1.545318
perform_cv_dt <- function(data, response, k = 10) {
    folds <- createFolds(data[[response]], k = k, list = TRUE, returnTrain = TRUE)
    
    mse_values <- c()
    
    for (i in 1:k) {
        train_indices <- folds[[i]]
        train_data <- data[train_indices, ]
        test_data <- data[-train_indices, ]
        
        dt_model <- rpart(as.formula(paste(response, "~ .")), data = train_data, method = "anova")
        dt_predictions <- predict(dt_model, newdata = test_data)
        y_test <- test_data[[response]]
        
        mse <- mean((dt_predictions - y_test)^2)
        mse_values <- c(mse_values, mse)
    }
    
    return(mse_values)
}

# Perform external cross-validation
cv_dt_mse_values <- perform_cv_dt(selected_metabolomics_data, "hs_zbmi_who", k = 10)

# Print results
cat("Cross-Validation Mean Squared Errors for Decision Tree:", cv_dt_mse_values, "\n")
## Cross-Validation Mean Squared Errors for Decision Tree: 1.444271 1.535309 1.341503 1.175869 1.382381 1.282105 1.479779 1.437367 1.154078 1.424771
cat("Mean MSE for Decision Tree:", mean(cv_dt_mse_values), "\n")
## Mean MSE for Decision Tree: 1.365743
cat("Standard Deviation of MSE for Decision Tree:", sd(cv_dt_mse_values), "\n")
## Standard Deviation of MSE for Decision Tree: 0.1270397

The decision tree model’s test set MSE of 1.545318 is significantly higher compared to the Elastic Net, Ridge, LASSO, and other models. This indicates that the decision tree model is less accurate in predicting the BMI Z-scores.

The standard deviation of 0.1270397 indicates moderate variability in model performance across different subsets of the data. This suggests that the model’s performance is somewhat consistent but still varies based on the specific data used for training and testing.

The decision tree model performs worse than the other models (LASSO, Ridge, and Elastic Net) in terms of predictive accuracy, as evidenced by the higher MSE values. The higher test set MSE and mean MSE from cross-validation indicate that the decision tree model is less suitable for this particular prediction task.

Random Forest

set.seed(101)
rf_model <- randomForest(hs_zbmi_who ~ . , data = train_data, ntree = 500)

rf_predictions <- predict(rf_model, newdata = test_data)

rf_mse <- mean((rf_predictions - y_test)^2)
cat("Random Forest Mean Squared Error on Test Set:", rf_mse, "\n")
## Random Forest Mean Squared Error on Test Set: 1.005087
importance(rf_model)
##                       IncNodePurity
## hs_child_age_None         4.3510904
## h_cohort                 11.4847925
## e3_sex_None               0.6469470
## e3_yearbir_None           5.0397600
## hs_cd_c_Log2              5.4034834
## hs_co_c_Log2              5.0423993
## hs_cs_c_Log2              4.6266269
## hs_cu_c_Log2             11.3629923
## hs_hg_c_Log2              5.3986631
## hs_mo_c_Log2             10.5561526
## hs_pb_c_Log2              6.0375115
## hs_dde_cadj_Log2         15.2674613
## hs_pcb153_cadj_Log2      43.7859103
## hs_pcb170_cadj_Log2      77.4592518
## hs_dep_cadj_Log2          6.9179021
## hs_pbde153_cadj_Log2     29.2165376
## hs_pfhxs_c_Log2           5.5574508
## hs_pfoa_c_Log2            9.1473793
## hs_pfos_c_Log2            6.2406329
## hs_prpa_cadj_Log2         5.5785583
## hs_mbzp_cadj_Log2         4.8381018
## hs_mibp_cadj_Log2         4.2835618
## hs_mnbp_cadj_Log2         4.0952772
## h_bfdur_Ter               3.0556126
## hs_bakery_prod_Ter        2.8734807
## hs_dairy_Ter              1.1982993
## hs_fastfood_Ter           0.7788861
## hs_org_food_Ter           1.0689860
## hs_readymade_Ter          1.7494233
## hs_total_bread_Ter        1.3698662
## hs_total_fish_Ter         1.3784976
## hs_total_fruits_Ter       1.1543863
## hs_total_lipids_Ter       1.0160675
## hs_total_potatoes_Ter     1.5864111
## hs_total_sweets_Ter       1.0535255
## hs_total_veg_Ter          1.2142266
## metab_1                   5.0246178
## metab_2                   4.9975168
## metab_3                   3.4643572
## metab_4                   5.5046649
## metab_5                   3.9463992
## metab_6                   7.3085806
## metab_7                   4.4257574
## metab_8                  31.8547065
## metab_9                   2.9959180
## metab_10                  3.1758405
## metab_11                  3.8734754
## metab_12                  3.2738085
## metab_13                  5.0382976
## metab_14                  4.8505501
## metab_15                  4.6498518
## metab_16                  2.8186537
## metab_17                  2.5814905
## metab_18                  3.4531828
## metab_19                  2.2677950
## metab_20                  3.5314013
## metab_21                  2.2608049
## metab_22                  2.5923937
## metab_23                  2.9864977
## metab_24                  3.7658293
## metab_25                  3.4717235
## metab_26                  7.5922897
## metab_27                  3.3079110
## metab_28                  3.0942746
## metab_29                  3.4194697
## metab_30                 19.0900288
## metab_31                  3.7296241
## metab_32                  2.9438450
## metab_33                  4.7884692
## metab_34                  2.5374169
## metab_35                  7.8801757
## metab_36                  3.5484238
## metab_37                  3.4615654
## metab_38                  2.8551929
## metab_39                  2.9023861
## metab_40                  5.5303117
## metab_41                  4.1589018
## metab_42                  5.7814479
## metab_43                  2.9051396
## metab_44                  3.1993167
## metab_45                  3.8627595
## metab_46                  4.7165866
## metab_47                  6.5126371
## metab_48                 11.6532289
## metab_49                 34.0040406
## metab_50                 10.2784818
## metab_51                  5.5997592
## metab_52                  3.5422826
## metab_53                  5.1768364
## metab_54                  4.5873675
## metab_55                  8.5822696
## metab_56                  3.5655906
## metab_57                  4.6641997
## metab_58                  3.3604162
## metab_59                  5.7724297
## metab_60                  4.8328602
## metab_61                  3.5217162
## metab_62                  3.6407068
## metab_63                  4.4807353
## metab_64                  4.3958412
## metab_65                  3.2690407
## metab_66                  2.6464596
## metab_67                  2.9324104
## metab_68                  3.9480657
## metab_69                  2.6555957
## metab_70                  2.6514940
## metab_71                  4.2321834
## metab_72                  3.7483895
## metab_73                  3.5601435
## metab_74                  2.5232306
## metab_75                  4.1790361
## metab_76                  2.3825409
## metab_77                  4.5203448
## metab_78                  4.2542324
## metab_79                  3.9202219
## metab_80                  3.5883304
## metab_81                  3.3478113
## metab_82                  5.2828197
## metab_83                  4.0533988
## metab_84                  3.3750068
## metab_85                  5.5406981
## metab_86                  3.5952497
## metab_87                  3.1383387
## metab_88                  2.6259304
## metab_89                  2.6560929
## metab_90                  2.6864011
## metab_91                  2.6201838
## metab_92                  3.0571047
## metab_93                  3.2157707
## metab_94                  9.3374896
## metab_95                 52.7476025
## metab_96                  7.0768415
## metab_97                  3.4158131
## metab_98                  3.4086182
## metab_99                  5.7843825
## metab_100                 3.7188542
## metab_101                 2.7941056
## metab_102                 4.9365328
## metab_103                 3.6992223
## metab_104                 4.7269214
## metab_105                 3.6214074
## metab_106                 3.6727061
## metab_107                 3.6544399
## metab_108                 3.2550617
## metab_109                 5.4854345
## metab_110                 7.4270442
## metab_111                 2.6425116
## metab_112                 2.7255616
## metab_113                 5.4538156
## metab_114                 3.3385618
## metab_115                 5.2528540
## metab_116                 4.6202248
## metab_117                 6.6989808
## metab_118                 2.5575078
## metab_119                 6.1560378
## metab_120                 7.0372080
## metab_121                 4.0063548
## metab_122                 6.9004492
## metab_123                 2.9740618
## metab_124                 3.8403649
## metab_125                 3.0350595
## metab_126                 2.5287840
## metab_127                 6.3605217
## metab_128                 6.3535024
## metab_129                 3.6497741
## metab_130                 3.5048690
## metab_131                 2.7358289
## metab_132                 3.1416537
## metab_133                 2.8089910
## metab_134                 3.8237787
## metab_135                 4.2005101
## metab_136                 5.3871803
## metab_137                 7.2446208
## metab_138                 5.6624417
## metab_139                 3.1680018
## metab_140                 3.2342640
## metab_141                 7.0874046
## metab_142                13.2308029
## metab_143                 8.1427795
## metab_144                 3.6521039
## metab_145                 3.9931242
## metab_146                 4.1109424
## metab_147                 3.7136688
## metab_148                 3.4295763
## metab_149                 5.0799668
## metab_150                 5.2297917
## metab_151                 3.6133544
## metab_152                 4.4017111
## metab_153                 4.1872432
## metab_154                 4.0488667
## metab_155                 2.6370643
## metab_156                 2.6304464
## metab_157                 3.3068935
## metab_158                 3.3021198
## metab_159                 2.8824300
## metab_160                 8.1179141
## metab_161                26.5008653
## metab_162                 3.7626315
## metab_163                16.1969248
## metab_164                 6.8100777
## metab_165                 3.6030471
## metab_166                 3.9155699
## metab_167                 3.3543066
## metab_168                 3.0729228
## metab_169                 4.0738976
## metab_170                 4.7208531
## metab_171                 4.1684339
## metab_172                 3.6712673
## metab_173                 4.0493143
## metab_174                 4.0874861
## metab_175                 4.4045373
## metab_176                 5.7419058
## metab_177                14.8456884
varImpPlot(rf_model)

The most important variables were hs_pcb170_cadj_Log2, hs_pcb153_cadj_Log2, hs_dde_cadj_Log2, and metab_95.

The Random Forest model’s test set MSE of 1.005087 is lower than the Decision Tree model but higher than LASSO, Ridge, and Elastic Net models. This indicates that while Random Forest performs better than a single decision tree, it still has a higher prediction error compared to regularized linear models.

The standard deviation of 0.1156282 indicates moderate variability in model performance across different subsets of the data. This suggests that the model’s performance is relatively consistent but varies slightly based on the specific data used for training and testing.

set.seed(101)

perform_cv_rf <- function(data, response, k = 10, ntree = 500) {
  folds <- createFolds(data[[response]], k = k, list = TRUE, returnTrain = TRUE)
  
  mse_values <- c()
  
  for (i in 1:k) {
    train_indices <- folds[[i]]
    train_data <- data[train_indices, ]
    test_data <- data[-train_indices, ]
    
    rf_model <- randomForest(as.formula(paste(response, "~ .")), data = train_data, ntree = ntree)
    rf_predictions <- predict(rf_model, newdata = test_data)
    y_test <- test_data[[response]]
    
    mse <- mean((rf_predictions - y_test)^2)
    mse_values <- c(mse_values, mse)
  }
  
  return(mse_values)
}

cv_rf_mse_values <- perform_cv_rf(selected_metabolomics_data, "hs_zbmi_who", k = 10, ntree = 500)

cat("Cross-Validation Mean Squared Errors for Random Forest:", cv_rf_mse_values, "\n")
## Cross-Validation Mean Squared Errors for Random Forest: 0.9862609 1.305689 0.9398012 1.005232 1.00973 1.004019 1.151253 0.9302271 0.9633375 0.9700484
cat("Mean MSE for Random Forest:", mean(cv_rf_mse_values), "\n")
## Mean MSE for Random Forest: 1.02656
cat("Standard Deviation of MSE for Random Forest:", sd(cv_rf_mse_values), "\n")
## Standard Deviation of MSE for Random Forest: 0.1156282

The Random Forest model performs better than the Decision Tree model but worse than the LASSO, Ridge, and Elastic Net models in terms of predictive accuracy. The relatively high test set MSE and cross-validation mean MSE indicate that the Random Forest model may not capture the complexity of the relationships in the data as effectively as the regularized linear models. However, the Random Forest model does provide insights into the importance of various predictor variables, which can be valuable for understanding the factors influencing BMI Z-scores.

GBM

set.seed(101)
gbm_model <- gbm(hs_zbmi_who ~ ., data = train_data, 
                 distribution = "gaussian",
                 n.trees = 1000,
                 interaction.depth = 3,
                 n.minobsinnode = 10,
                 shrinkage = 0.01,
                 cv.folds = 10,
                 verbose = TRUE)
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.4855             nan     0.0100    0.0043
##      2        1.4808             nan     0.0100    0.0029
##      3        1.4762             nan     0.0100    0.0034
##      4        1.4715             nan     0.0100    0.0023
##      5        1.4660             nan     0.0100    0.0036
##      6        1.4610             nan     0.0100    0.0031
##      7        1.4551             nan     0.0100    0.0044
##      8        1.4506             nan     0.0100    0.0031
##      9        1.4447             nan     0.0100    0.0037
##     10        1.4398             nan     0.0100    0.0027
##     20        1.3938             nan     0.0100    0.0029
##     40        1.3190             nan     0.0100    0.0018
##     60        1.2503             nan     0.0100    0.0021
##     80        1.1923             nan     0.0100    0.0014
##    100        1.1401             nan     0.0100    0.0015
##    120        1.0938             nan     0.0100    0.0011
##    140        1.0500             nan     0.0100    0.0015
##    160        1.0105             nan     0.0100    0.0015
##    180        0.9745             nan     0.0100    0.0011
##    200        0.9403             nan     0.0100    0.0006
##    220        0.9095             nan     0.0100    0.0003
##    240        0.8812             nan     0.0100    0.0002
##    260        0.8541             nan     0.0100    0.0008
##    280        0.8285             nan     0.0100    0.0007
##    300        0.8047             nan     0.0100   -0.0000
##    320        0.7803             nan     0.0100    0.0003
##    340        0.7575             nan     0.0100    0.0003
##    360        0.7367             nan     0.0100   -0.0002
##    380        0.7178             nan     0.0100    0.0003
##    400        0.6997             nan     0.0100    0.0002
##    420        0.6820             nan     0.0100    0.0003
##    440        0.6651             nan     0.0100    0.0000
##    460        0.6485             nan     0.0100    0.0002
##    480        0.6327             nan     0.0100    0.0001
##    500        0.6195             nan     0.0100    0.0002
##    520        0.6064             nan     0.0100   -0.0001
##    540        0.5933             nan     0.0100    0.0005
##    560        0.5802             nan     0.0100    0.0001
##    580        0.5679             nan     0.0100   -0.0002
##    600        0.5560             nan     0.0100   -0.0001
##    620        0.5447             nan     0.0100   -0.0001
##    640        0.5340             nan     0.0100   -0.0001
##    660        0.5240             nan     0.0100   -0.0003
##    680        0.5143             nan     0.0100   -0.0001
##    700        0.5045             nan     0.0100    0.0000
##    720        0.4950             nan     0.0100   -0.0001
##    740        0.4859             nan     0.0100    0.0001
##    760        0.4771             nan     0.0100   -0.0001
##    780        0.4686             nan     0.0100   -0.0003
##    800        0.4601             nan     0.0100    0.0002
##    820        0.4525             nan     0.0100   -0.0001
##    840        0.4452             nan     0.0100   -0.0002
##    860        0.4375             nan     0.0100   -0.0000
##    880        0.4304             nan     0.0100    0.0000
##    900        0.4231             nan     0.0100   -0.0001
##    920        0.4158             nan     0.0100    0.0001
##    940        0.4089             nan     0.0100   -0.0000
##    960        0.4021             nan     0.0100   -0.0000
##    980        0.3954             nan     0.0100   -0.0000
##   1000        0.3895             nan     0.0100   -0.0002
best_trees <- gbm.perf(gbm_model, method = "cv")

gbm_predictions <- predict(gbm_model, newdata = test_data, n.trees = best_trees)
gbm_mse <- mean((gbm_predictions - y_test)^2)
cat("GBM Mean Squared Error on Test Set:", gbm_mse, "\n")
## GBM Mean Squared Error on Test Set: 0.908953
gbm_importance <- summary(gbm_model)

print(gbm_importance)
##                                         var    rel.inf
## hs_pcb170_cadj_Log2     hs_pcb170_cadj_Log2 9.21435394
## metab_95                           metab_95 6.12529044
## metab_161                         metab_161 4.29133133
## metab_8                             metab_8 4.20140693
## metab_49                           metab_49 3.83139716
## hs_pbde153_cadj_Log2   hs_pbde153_cadj_Log2 3.49867604
## metab_163                         metab_163 2.65872339
## metab_48                           metab_48 2.51780669
## metab_142                         metab_142 2.17504712
## metab_30                           metab_30 1.97683256
## metab_177                         metab_177 1.94854264
## hs_cu_c_Log2                   hs_cu_c_Log2 1.94736685
## metab_160                         metab_160 1.86497462
## hs_pcb153_cadj_Log2     hs_pcb153_cadj_Log2 1.78422480
## metab_26                           metab_26 1.76368807
## hs_dde_cadj_Log2           hs_dde_cadj_Log2 1.66654131
## h_cohort                           h_cohort 1.52703265
## metab_42                           metab_42 1.47041612
## hs_pfoa_c_Log2               hs_pfoa_c_Log2 1.42319938
## metab_6                             metab_6 1.34626340
## metab_50                           metab_50 1.12397363
## metab_113                         metab_113 1.07435973
## metab_59                           metab_59 1.06962782
## metab_94                           metab_94 1.04648366
## metab_47                           metab_47 1.04122724
## metab_143                         metab_143 1.01509938
## metab_122                         metab_122 1.00354677
## metab_141                         metab_141 0.98085620
## metab_110                         metab_110 0.97927747
## hs_pfos_c_Log2               hs_pfos_c_Log2 0.85948479
## metab_104                         metab_104 0.85031732
## hs_mo_c_Log2                   hs_mo_c_Log2 0.82951149
## h_bfdur_Ter                     h_bfdur_Ter 0.81510817
## metab_135                         metab_135 0.78182979
## metab_82                           metab_82 0.77474678
## metab_128                         metab_128 0.70883473
## metab_120                         metab_120 0.68928464
## metab_75                           metab_75 0.67432506
## metab_35                           metab_35 0.66146968
## hs_co_c_Log2                   hs_co_c_Log2 0.63715415
## metab_136                         metab_136 0.61846009
## metab_137                         metab_137 0.55830347
## metab_150                         metab_150 0.53711806
## metab_57                           metab_57 0.50899231
## metab_96                           metab_96 0.49870997
## metab_99                           metab_99 0.46408757
## metab_31                           metab_31 0.42133082
## metab_117                         metab_117 0.41898614
## metab_109                         metab_109 0.40265592
## metab_115                         metab_115 0.40109096
## hs_hg_c_Log2                   hs_hg_c_Log2 0.38443533
## metab_54                           metab_54 0.37000547
## metab_144                         metab_144 0.36729834
## metab_81                           metab_81 0.35109191
## metab_60                           metab_60 0.34993495
## metab_7                             metab_7 0.34974212
## metab_152                         metab_152 0.34183019
## metab_44                           metab_44 0.32823676
## metab_91                           metab_91 0.32356363
## hs_mbzp_cadj_Log2         hs_mbzp_cadj_Log2 0.31426435
## metab_116                         metab_116 0.31092421
## metab_100                         metab_100 0.29220957
## metab_53                           metab_53 0.28880255
## e3_sex_None                     e3_sex_None 0.28682635
## metab_172                         metab_172 0.28582661
## metab_176                         metab_176 0.28365827
## metab_127                         metab_127 0.28237308
## metab_56                           metab_56 0.27682862
## metab_78                           metab_78 0.27595054
## metab_146                         metab_146 0.27559918
## hs_bakery_prod_Ter       hs_bakery_prod_Ter 0.27407445
## metab_85                           metab_85 0.26867074
## hs_pfhxs_c_Log2             hs_pfhxs_c_Log2 0.26742245
## hs_child_age_None         hs_child_age_None 0.26074942
## metab_79                           metab_79 0.24910262
## metab_14                           metab_14 0.23620204
## metab_51                           metab_51 0.23523793
## metab_11                           metab_11 0.22917821
## metab_171                         metab_171 0.22833636
## metab_149                         metab_149 0.22001857
## metab_175                         metab_175 0.21559396
## metab_63                           metab_63 0.21100362
## hs_pb_c_Log2                   hs_pb_c_Log2 0.20993635
## metab_41                           metab_41 0.20768005
## metab_138                         metab_138 0.20526286
## metab_153                         metab_153 0.20351830
## metab_40                           metab_40 0.19003227
## metab_2                             metab_2 0.18640860
## metab_24                           metab_24 0.18423314
## hs_cs_c_Log2                   hs_cs_c_Log2 0.17801329
## metab_64                           metab_64 0.17743150
## metab_1                             metab_1 0.17640436
## e3_yearbir_None             e3_yearbir_None 0.17570680
## metab_33                           metab_33 0.17438034
## hs_dep_cadj_Log2           hs_dep_cadj_Log2 0.17302857
## metab_62                           metab_62 0.16987206
## metab_43                           metab_43 0.16871021
## metab_103                         metab_103 0.15846310
## metab_118                         metab_118 0.15798771
## metab_37                           metab_37 0.15519804
## metab_71                           metab_71 0.15364187
## metab_98                           metab_98 0.15261368
## metab_83                           metab_83 0.15259990
## metab_133                         metab_133 0.14825971
## hs_readymade_Ter           hs_readymade_Ter 0.14712459
## metab_86                           metab_86 0.14044892
## metab_29                           metab_29 0.13736242
## metab_4                             metab_4 0.13508193
## hs_prpa_cadj_Log2         hs_prpa_cadj_Log2 0.13340895
## metab_5                             metab_5 0.13330788
## metab_165                         metab_165 0.13234653
## metab_170                         metab_170 0.13109065
## metab_108                         metab_108 0.12703154
## hs_mnbp_cadj_Log2         hs_mnbp_cadj_Log2 0.12681607
## metab_55                           metab_55 0.12529135
## metab_132                         metab_132 0.12413135
## metab_106                         metab_106 0.12345574
## metab_27                           metab_27 0.12253579
## metab_101                         metab_101 0.11956277
## metab_38                           metab_38 0.11579047
## metab_15                           metab_15 0.11513952
## metab_23                           metab_23 0.11342822
## metab_129                         metab_129 0.11121879
## metab_77                           metab_77 0.11078187
## metab_107                         metab_107 0.10789491
## metab_173                         metab_173 0.10683381
## metab_70                           metab_70 0.10646072
## hs_cd_c_Log2                   hs_cd_c_Log2 0.10461251
## metab_36                           metab_36 0.10362290
## metab_151                         metab_151 0.10344207
## metab_20                           metab_20 0.10289213
## metab_46                           metab_46 0.10229936
## metab_97                           metab_97 0.10127479
## metab_3                             metab_3 0.10059615
## metab_162                         metab_162 0.10016182
## metab_145                         metab_145 0.09805399
## metab_9                             metab_9 0.09640116
## metab_131                         metab_131 0.09427950
## metab_68                           metab_68 0.09269352
## metab_124                         metab_124 0.09208971
## metab_112                         metab_112 0.08829838
## hs_mibp_cadj_Log2         hs_mibp_cadj_Log2 0.08635488
## metab_154                         metab_154 0.08574803
## metab_72                           metab_72 0.08511834
## metab_114                         metab_114 0.08420001
## metab_87                           metab_87 0.08415133
## metab_130                         metab_130 0.08348063
## metab_168                         metab_168 0.08090037
## metab_174                         metab_174 0.07980914
## metab_123                         metab_123 0.07974263
## metab_121                         metab_121 0.07760628
## metab_147                         metab_147 0.07447606
## metab_52                           metab_52 0.07226001
## metab_32                           metab_32 0.07158845
## metab_139                         metab_139 0.07018489
## metab_157                         metab_157 0.07018327
## metab_22                           metab_22 0.06686202
## metab_39                           metab_39 0.06219575
## metab_164                         metab_164 0.06122740
## metab_119                         metab_119 0.06107417
## metab_58                           metab_58 0.06016433
## hs_total_fish_Ter         hs_total_fish_Ter 0.05909688
## metab_89                           metab_89 0.05836652
## hs_total_fruits_Ter     hs_total_fruits_Ter 0.05719210
## metab_45                           metab_45 0.05490884
## metab_126                         metab_126 0.05470539
## hs_total_potatoes_Ter hs_total_potatoes_Ter 0.05182736
## metab_90                           metab_90 0.05103412
## metab_159                         metab_159 0.04965418
## metab_21                           metab_21 0.04806205
## metab_166                         metab_166 0.04802151
## metab_28                           metab_28 0.04606587
## metab_167                         metab_167 0.04589311
## metab_34                           metab_34 0.04473990
## metab_65                           metab_65 0.04332047
## metab_158                         metab_158 0.04317097
## metab_92                           metab_92 0.04311341
## metab_66                           metab_66 0.04174686
## metab_13                           metab_13 0.04153513
## metab_76                           metab_76 0.04130623
## metab_25                           metab_25 0.03893841
## metab_84                           metab_84 0.03602835
## metab_169                         metab_169 0.03491204
## metab_80                           metab_80 0.03474273
## metab_140                         metab_140 0.03354965
## metab_102                         metab_102 0.03335978
## metab_18                           metab_18 0.03198494
## metab_155                         metab_155 0.02609740
## metab_12                           metab_12 0.02549673
## metab_67                           metab_67 0.02531720
## metab_105                         metab_105 0.02449752
## metab_17                           metab_17 0.02417120
## metab_111                         metab_111 0.02300628
## metab_93                           metab_93 0.02261141
## metab_156                         metab_156 0.02189308
## metab_74                           metab_74 0.01993548
## metab_19                           metab_19 0.01981961
## hs_dairy_Ter                   hs_dairy_Ter 0.01899735
## metab_61                           metab_61 0.01444169
## metab_148                         metab_148 0.01440909
## metab_69                           metab_69 0.01418709
## metab_16                           metab_16 0.01380871
## hs_total_lipids_Ter     hs_total_lipids_Ter 0.01294604
## hs_org_food_Ter             hs_org_food_Ter 0.01275221
## hs_fastfood_Ter             hs_fastfood_Ter 0.00000000
## hs_total_bread_Ter       hs_total_bread_Ter 0.00000000
## hs_total_sweets_Ter     hs_total_sweets_Ter 0.00000000
## hs_total_veg_Ter           hs_total_veg_Ter 0.00000000
## metab_10                           metab_10 0.00000000
## metab_73                           metab_73 0.00000000
## metab_88                           metab_88 0.00000000
## metab_125                         metab_125 0.00000000
## metab_134                         metab_134 0.00000000

The MSE on the test set was 0.908953. This indicates the average squared difference between the predicted and actual BMI Z-scores on the test data, suggesting that the GBM model performs better than the Random Forest and Decision Tree models but worse than some of the regularized linear models.

The top predictors (in terms of relative importance) are:

  • h_cohort: 17.603606

  • hs_pcb170_cadj_Log2: 14.908312

  • hs_pcb153_cadj_Log2: 12.522132

  • hs_child_age_None: 9.124261

  • e3_yearbir_None: 6.265682

perform_cv_gbm <- function(data, response, k = 10, n.trees = 1000, interaction.depth = 3, n.minobsinnode = 10, shrinkage = 0.01) {
    folds <- createFolds(data[[response]], k = k, list = TRUE, returnTrain = TRUE)
    
    mse_values <- c()
    
    for (i in 1:k) {
        train_indices <- folds[[i]]
        train_data <- data[train_indices, ]
        test_data <- data[-train_indices, ]
        
        gbm_model <- gbm(as.formula(paste(response, "~ .")), data = train_data, 
                         distribution = "gaussian",
                         n.trees = n.trees,
                         interaction.depth = interaction.depth,
                         n.minobsinnode = n.minobsinnode,
                         shrinkage = shrinkage,
                         cv.folds = 5, 
                         verbose = FALSE)
        
        best_trees <- gbm.perf(gbm_model, method = "cv")
        gbm_predictions <- predict(gbm_model, newdata = test_data, n.trees = best_trees)
        y_test <- test_data[[response]]
        
        mse <- mean((gbm_predictions - y_test)^2)
        mse_values <- c(mse_values, mse)
    }
    
    return(mse_values)
}

# Perform external cross-validation
cv_gbm_mse_values <- perform_cv_gbm(selected_metabolomics_data, "hs_zbmi_who", k = 10, n.trees = 1000, interaction.depth = 3, n.minobsinnode = 10, shrinkage = 0.01)

# Print results
cat("Cross-Validation Mean Squared Errors for GBM:", cv_gbm_mse_values, "\n")
## Cross-Validation Mean Squared Errors for GBM: 0.6487339 0.7841818 1.022545 1.010532 0.7219118 1.004807 0.7960597 0.9336514 0.8014332 0.9069533
cat("Mean MSE for GBM:", mean(cv_gbm_mse_values), "\n")
## Mean MSE for GBM: 0.8630809
cat("Standard Deviation of MSE for GBM:", sd(cv_gbm_mse_values), "\n")
## Standard Deviation of MSE for GBM: 0.131044

The GBM model’s test set MSE of 0.908953 is better than the Random Forest model (MSE: 1.005087) and the Decision Tree model (MSE: 1.545318), but worse than the LASSO (MSE: 0.7417664), Ridge (MSE: 0.7391414), and Elastic Net (MSE: 0.7391896) models.

The standard deviation of 0.131044 indicates moderate variability in model performance across different subsets of the data. This suggests that the model’s performance is relatively consistent but varies slightly based on the specific data used for training and testing.

The GBM model performs better than the Random Forest and Decision Tree models but worse than the LASSO, Ridge, and Elastic Net models in terms of predictive accuracy. The relatively lower test set MSE and cross-validation mean MSE compared to the other tree-based models indicate that the GBM model captures the complexity of the relationships in the data more effectively. However, the regularized linear models (LASSO, Ridge, and Elastic Net) still outperform the GBM model in this specific dataset. The GBM model also provides insights into the importance of various predictor variables, which can be valuable for understanding the factors influencing BMI Z-scores.

Group LASSO

With Metabolomics

Using group lasso since there are variables that are correlated with each other.

selected_metabolomics_data <- selected_metabolomics_data %>% na.omit()

median_value <- median(selected_metabolomics_data$hs_zbmi_who, na.rm = TRUE)
selected_metabolomics_data$hs_zbmi_who_binary <- ifelse(selected_metabolomics_data$hs_zbmi_who > median_value, 1, 0)

set.seed(101)
trainIndex <- caret::createDataPartition(selected_metabolomics_data$hs_zbmi_who_binary, p = .7, list = FALSE, times = 1)
train_data <- selected_metabolomics_data[trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]

train_data_clean <- train_data[complete.cases(train_data), ]
test_data_clean <- test_data[complete.cases(test_data), ]

x_train <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = train_data_clean)[, -1]
y_train <- as.numeric(train_data_clean$hs_zbmi_who_binary)

x_test <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = test_data_clean)[, -1]
y_test <- as.numeric(test_data_clean$hs_zbmi_who_binary)

num_chemicals <- length(chemicals_selected)
num_diet <- length(diet_selected)
num_metabolomics <- ncol(metabol_serum_transposed) - 1  # Excluding ID
num_covariates <- ncol(outcome_and_cov) - 3  # Excluding ID and outcome

# Combine all the lengths
total_length <- num_chemicals + num_diet + num_metabolomics + num_covariates
cat("Total length of predictors:", total_length, "\n")
## Total length of predictors: 214
cat("Number of predictors in x_train:", ncol(x_train), "\n")
## Number of predictors in x_train: 235
group_indices <- c(
  rep(1, num_chemicals),  # Group 1: Chemicals
  rep(2, num_diet),       # Group 2: Postnatal diet
  rep(3, num_metabolomics), # Group 3: Metabolomics (excluding ID)
  rep(4, num_covariates)  # Group 4: Covariates (excluding ID and outcome)
)

if (length(group_indices) < ncol(x_train)) {
  group_indices <- c(group_indices, rep(5, ncol(x_train) - length(group_indices)))
} else if (length(group_indices) > ncol(x_train)) {
  group_indices <- group_indices[1:ncol(x_train)]
}

cat("Length of group_indices:", length(group_indices), "\n")
## Length of group_indices: 235
cat("Number of columns in x_train:", ncol(x_train), "\n")
## Number of columns in x_train: 235
group_lasso_model <- grplasso(x_train, y_train, index = group_indices, lambda = 0.1, model = LogReg())
## Couldn't find intercept. Setting center = FALSE.
## Lambda: 0.1  nr.var: 235
coef(group_lasso_model)
##                                        0.1
## hs_child_age_None              -0.59594179
## h_cohort2                       2.94619215
## h_cohort3                       4.20544075
## h_cohort4                       2.28549526
## h_cohort5                       3.53575315
## h_cohort6                       1.33791029
## e3_sex_Nonemale                 0.55010520
## e3_yearbir_None2004            -0.30669786
## e3_yearbir_None2005             1.33785133
## e3_yearbir_None2006             1.45611112
## e3_yearbir_None2007             4.21420838
## e3_yearbir_None2008             3.81865197
## e3_yearbir_None2009             5.98227152
## hs_cd_c_Log2                   -0.02991542
## hs_co_c_Log2                    0.40144931
## hs_cs_c_Log2                    0.38734130
## hs_cu_c_Log2                    0.30042794
## hs_hg_c_Log2                    0.03471235
## hs_mo_c_Log2                   -0.25757374
## hs_pb_c_Log2                    0.36380980
## hs_dde_cadj_Log2               -0.03789731
## hs_pcb153_cadj_Log2            -0.88685635
## hs_pcb170_cadj_Log2            -0.03483853
## hs_dep_cadj_Log2               -0.10223967
## hs_pbde153_cadj_Log2           -0.10936927
## hs_pfhxs_c_Log2                 0.17587504
## hs_pfoa_c_Log2                 -0.74412273
## hs_pfos_c_Log2                  0.24728137
## hs_prpa_cadj_Log2              -0.03488188
## hs_mbzp_cadj_Log2               0.16940808
## hs_mibp_cadj_Log2              -0.06761674
## hs_mnbp_cadj_Log2              -0.16596006
## h_bfdur_Ter(10.8,34.9]          0.49071128
## h_bfdur_Ter(34.9,Inf]           0.82884074
## hs_bakery_prod_Ter(2,6]        -0.33175502
## hs_bakery_prod_Ter(6,Inf]      -0.62215623
## hs_dairy_Ter(14.6,25.6]         0.08482188
## hs_dairy_Ter(25.6,Inf]          0.31733174
## hs_fastfood_Ter(0.132,0.5]     -0.50342416
## hs_fastfood_Ter(0.5,Inf]       -0.39055825
## hs_org_food_Ter(0.132,1]        0.57735040
## hs_org_food_Ter(1,Inf]          0.58461426
## hs_readymade_Ter(0.132,0.5]    -0.09298378
## hs_readymade_Ter(0.5,Inf]       0.05665265
## hs_total_bread_Ter(7,17.5]     -0.82759896
## hs_total_bread_Ter(17.5,Inf]   -0.30922869
## hs_total_fish_Ter(1.5,3]        0.12036133
## hs_total_fish_Ter(3,Inf]        0.03468958
## hs_total_fruits_Ter(7,14.1]     0.18078333
## hs_total_fruits_Ter(14.1,Inf]   0.90674706
## hs_total_lipids_Ter(3,7]        0.56975800
## hs_total_lipids_Ter(7,Inf]      0.70113214
## hs_total_potatoes_Ter(3,4]      0.10370866
## hs_total_potatoes_Ter(4,Inf]    0.02634925
## hs_total_sweets_Ter(4.1,8.5]   -0.10764106
## hs_total_sweets_Ter(8.5,Inf]    0.24628178
## hs_total_veg_Ter(6,8.5]         0.37180310
## hs_total_veg_Ter(8.5,Inf]      -0.70637597
## metab_1                        -0.21934113
## metab_2                         0.54016142
## metab_3                        -0.74200611
## metab_4                         0.06568424
## metab_5                         1.37132520
## metab_6                         0.64176747
## metab_7                         0.29293822
## metab_8                         1.32214376
## metab_9                        -2.25106218
## metab_10                        1.68737529
## metab_11                       -0.14497922
## metab_12                        1.50079257
## metab_13                       -1.21689862
## metab_14                       -5.32966069
## metab_15                        1.21236562
## metab_16                        2.85306453
## metab_17                       -0.78502372
## metab_18                       -2.97579131
## metab_19                       -0.66505335
## metab_20                       -1.74223862
## metab_21                       -0.52106555
## metab_22                        0.18268569
## metab_23                       -0.22152468
## metab_24                        1.36898453
## metab_25                        2.38256922
## metab_26                       -0.23673689
## metab_27                        1.17306729
## metab_28                        2.50077667
## metab_29                       -0.42455639
## metab_30                        0.56194371
## metab_31                        0.25305593
## metab_32                       -0.90190386
## metab_33                        0.46296206
## metab_34                       -2.33614366
## metab_35                       -1.20435717
## metab_36                        3.56188300
## metab_37                        0.05864185
## metab_38                       -0.51713510
## metab_39                       -0.30852596
## metab_40                        0.70085987
## metab_41                       -0.60605517
## metab_42                       -0.96417064
## metab_43                       -0.36379011
## metab_44                        1.03711946
## metab_45                        0.77318389
## metab_46                        0.23898441
## metab_47                        2.27854547
## metab_48                       -4.04965259
## metab_49                        1.11250731
## metab_50                       -0.41745634
## metab_51                        1.18096488
## metab_52                        4.10323092
## metab_53                       -0.93368389
## metab_54                        1.43641582
## metab_55                        1.93852307
## metab_56                       -0.03250476
## metab_57                       -3.87166609
## metab_58                        4.35090796
## metab_59                       -0.34237371
## metab_60                        2.20494743
## metab_61                       -5.69362107
## metab_62                        6.68588408
## metab_63                       -1.16115309
## metab_64                        1.26355362
## metab_65                       -4.62622247
## metab_66                       -0.08370978
## metab_67                       -2.35035264
## metab_68                        1.67410749
## metab_69                        2.02867560
## metab_70                        0.93282328
## metab_71                       -2.65532957
## metab_72                       -0.23011708
## metab_73                       -1.48347533
## metab_74                       -1.38113227
## metab_75                       -0.81775989
## metab_76                        0.92821973
## metab_77                        0.10425366
## metab_78                        0.43722605
## metab_79                        0.11121327
## metab_80                        4.97291644
## metab_81                        1.43961430
## metab_82                      -14.26682996
## metab_83                        1.32235179
## metab_84                       -1.53978599
## metab_85                       -3.72908363
## metab_86                        3.57884167
## metab_87                        7.19099249
## metab_88                        1.40098003
## metab_89                      -10.65496621
## metab_90                       10.66417995
## metab_91                        1.41255415
## metab_92                       -0.42693424
## metab_93                       -5.42880812
## metab_94                        0.04874111
## metab_95                        8.77043869
## metab_96                        0.42518460
## metab_97                        0.86170719
## metab_98                       -3.41052985
## metab_99                       -2.47278476
## metab_100                      -0.20728184
## metab_101                      -1.78718463
## metab_102                      -2.78128756
## metab_103                      -0.11445587
## metab_104                       2.50227154
## metab_105                      -0.64500045
## metab_106                      -0.35289814
## metab_107                       2.74546219
## metab_108                       1.57505918
## metab_109                      -0.16557016
## metab_110                       0.13745282
## metab_111                      -4.49215071
## metab_112                      -0.73576223
## metab_113                       5.59108227
## metab_114                       7.70992520
## metab_115                       0.94851577
## metab_116                      -2.83155127
## metab_117                       2.16588882
## metab_118                      -6.08119649
## metab_119                      -2.69324396
## metab_120                      -0.98390405
## metab_121                       0.95205810
## metab_122                      -4.35303802
## metab_123                       6.49564123
## metab_124                       7.59604599
## metab_125                      -2.33794629
## metab_126                       0.39290751
## metab_127                      -0.46954422
## metab_128                       1.30103700
## metab_129                      -0.33323546
## metab_130                      -8.58220959
## metab_131                      -5.24394711
## metab_132                      -3.32858163
## metab_133                      -4.56405527
## metab_134                       3.38559845
## metab_135                      -4.22847726
## metab_136                      -0.33050194
## metab_137                       2.97520459
## metab_138                       6.61980970
## metab_139                       1.74260596
## metab_140                       0.15080761
## metab_141                      -3.03582258
## metab_142                       0.01740251
## metab_143                      -0.47378851
## metab_144                       4.24884611
## metab_145                      -1.65688181
## metab_146                      -1.28339994
## metab_147                       0.02082390
## metab_148                       0.18016510
## metab_149                      -0.43500152
## metab_150                       0.63571180
## metab_151                       0.01113420
## metab_152                      -0.27922344
## metab_153                       0.10717311
## metab_154                       0.07663273
## metab_155                      -2.62303160
## metab_156                       3.48385076
## metab_157                      -0.63739240
## metab_158                       0.53092448
## metab_159                      -0.51607989
## metab_160                     -11.02724555
## metab_161                      12.78034433
## metab_162                       3.61007873
## metab_163                      -3.12580833
## metab_164                      -0.31620579
## metab_165                       3.22067819
## metab_166                      -2.34266060
## metab_167                      -0.52238720
## metab_168                      -0.73765010
## metab_169                      -0.65010404
## metab_170                      -0.40325886
## metab_171                      -0.61414868
## metab_172                      -0.94817162
## metab_173                       2.78369971
## metab_174                      -2.06248937
## metab_175                      -2.32118641
## metab_176                       1.30062902
## metab_177                       1.17342731
group_lasso_predictions <- predict(group_lasso_model, newdata = x_test, type = "response")
# convert probabilities to binary predictions
binary_predictions <- ifelse(group_lasso_predictions > 0.5, 1, 0)

accuracy <- mean(binary_predictions == y_test)
cat("Group LASSO Accuracy on Test Set:", accuracy, "\n")
## Group LASSO Accuracy on Test Set: 0.7122905
conf_matrix <- confusionMatrix(factor(binary_predictions), factor(y_test))
conf_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 128  52
##          1  51 127
##                                           
##                Accuracy : 0.7123          
##                  95% CI : (0.6624, 0.7587)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : 2.631e-16       
##                                           
##                   Kappa : 0.4246          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.7151          
##             Specificity : 0.7095          
##          Pos Pred Value : 0.7111          
##          Neg Pred Value : 0.7135          
##              Prevalence : 0.5000          
##          Detection Rate : 0.3575          
##    Detection Prevalence : 0.5028          
##       Balanced Accuracy : 0.7123          
##                                           
##        'Positive' Class : 0               
## 
# ROC Curve and AUC
roc_curve <- roc(y_test, group_lasso_predictions)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Group LASSO Model (with metabolomics)")

auc_value <- auc(roc_curve)
cat("Group LASSO AUC on Test Set:", auc_value)
## Group LASSO AUC on Test Set: 0.7795637

Using the median to convert the continuous outcome to a binary variable ensures a balanced dataset and robust classification threshold, leading to a more interpretable and reliable model.

The Group LASSO model shows a good ability to predict the binary outcome with an accuracy of 71.2% and an AUC of 0.7796.

Without Metabolomics

finalized_data <- finalized_data %>% na.omit()

median_value <- median(finalized_data$hs_zbmi_who, na.rm = TRUE)
finalized_data$hs_zbmi_who_binary <- ifelse(finalized_data$hs_zbmi_who > median_value, 1, 0)

set.seed(101)
trainIndex <- createDataPartition(finalized_data$hs_zbmi_who_binary, p = .7, list = FALSE, times = 1)
train_data <- finalized_data[trainIndex,]
test_data  <- finalized_data[-trainIndex,]

train_data_clean <- train_data[complete.cases(train_data), ]

x_train <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = train_data_clean)[,-1]
y_train <- as.numeric(train_data_clean$hs_zbmi_who_binary)

test_data_clean <- test_data[complete.cases(test_data), ]

x_test <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = test_data_clean)[,-1]
y_test <- as.numeric(test_data_clean$hs_zbmi_who_binary)

num_chemicals <- length(chemicals_selected)
num_diet <- length(diet_selected)
num_covariates <- ncol(outcome_and_cov) - 2  # excluding outcome and binary outcome

total_length <- num_chemicals + num_diet + num_covariates

group_indices <- c(
    rep(1, num_chemicals),  # Group 1: Chemicals
    rep(2, num_diet),  # Group 2: Postnatal diet
    rep(3, num_covariates)  # Group 3: Covariates (excluding outcome)
)

length(group_indices) == ncol(x_train)
## [1] FALSE
# adjust length if necessary
if (length(group_indices) < ncol(x_train)) {
    group_indices <- c(group_indices, rep(4, ncol(x_train) - length(group_indices)))
}

length(group_indices) == ncol(x_train)
## [1] TRUE
group_lasso_model <- grplasso(x_train, y_train, index = group_indices, lambda = 0.1, model = LogReg())
## Couldn't find intercept. Setting center = FALSE.
## Lambda: 0.1  nr.var: 58
group_lasso_coef <- coef(group_lasso_model)
print(group_lasso_coef)
##                                        0.1
## e3_sex_Nonemale                0.221920506
## e3_yearbir_None2004           -0.294855639
## e3_yearbir_None2005            0.158323963
## e3_yearbir_None2006            0.408984249
## e3_yearbir_None2007            0.680806654
## e3_yearbir_None2008            0.841852904
## e3_yearbir_None2009            1.513577179
## h_cohort2                      1.813429084
## h_cohort3                      1.891844790
## h_cohort4                      1.394374662
## h_cohort5                      0.829935236
## h_cohort6                      0.930012500
## hs_child_age_None             -0.239459872
## h_bfdur_Ter(10.8,34.9]         0.023986987
## h_bfdur_Ter(34.9,Inf]          0.420097494
## hs_bakery_prod_Ter(2,6]       -0.356797510
## hs_bakery_prod_Ter(6,Inf]     -0.662368185
## hs_dairy_Ter(14.6,25.6]        0.167086366
## hs_dairy_Ter(25.6,Inf]        -0.081599238
## hs_fastfood_Ter(0.132,0.5]     0.140662512
## hs_fastfood_Ter(0.5,Inf]       0.099048410
## hs_org_food_Ter(0.132,1]       0.143989602
## hs_org_food_Ter(1,Inf]         0.110093372
## hs_readymade_Ter(0.132,0.5]   -0.008761025
## hs_readymade_Ter(0.5,Inf]      0.012590159
## hs_total_bread_Ter(7,17.5]    -0.222727634
## hs_total_bread_Ter(17.5,Inf]  -0.140549045
## hs_total_fish_Ter(1.5,3]      -0.026376780
## hs_total_fish_Ter(3,Inf]       0.198175822
## hs_total_fruits_Ter(7,14.1]    0.183218671
## hs_total_fruits_Ter(14.1,Inf]  0.160101985
## hs_total_lipids_Ter(3,7]      -0.136445270
## hs_total_lipids_Ter(7,Inf]    -0.181943661
## hs_total_potatoes_Ter(3,4]    -0.009198169
## hs_total_potatoes_Ter(4,Inf]  -0.024564718
## hs_total_sweets_Ter(4.1,8.5]  -0.188806743
## hs_total_sweets_Ter(8.5,Inf]  -0.008076773
## hs_total_veg_Ter(6,8.5]        0.061858512
## hs_total_veg_Ter(8.5,Inf]     -0.132253798
## hs_cd_c_Log2                  -0.003912680
## hs_co_c_Log2                   0.020024149
## hs_cs_c_Log2                   0.391925922
## hs_cu_c_Log2                   0.459302065
## hs_hg_c_Log2                   0.015248669
## hs_mo_c_Log2                  -0.202463754
## hs_pb_c_Log2                  -0.161514332
## hs_dde_cadj_Log2              -0.145937394
## hs_pcb153_cadj_Log2           -0.740350353
## hs_pcb170_cadj_Log2           -0.101634637
## hs_dep_cadj_Log2              -0.040875393
## hs_pbde153_cadj_Log2          -0.057679189
## hs_pfhxs_c_Log2                0.088655509
## hs_pfoa_c_Log2                -0.342463752
## hs_pfos_c_Log2                 0.024346959
## hs_prpa_cadj_Log2             -0.021925097
## hs_mbzp_cadj_Log2              0.177492500
## hs_mibp_cadj_Log2             -0.114760937
## hs_mnbp_cadj_Log2             -0.090520610
group_lasso_predictions <- predict(group_lasso_model, newdata = x_test, type = "response")
binary_predictions <- ifelse(group_lasso_predictions > 0.5, 1, 0)

accuracy <- mean(binary_predictions == y_test)
cat("Group LASSO Accuracy on Test Set:", accuracy, "\n")
## Group LASSO Accuracy on Test Set: 0.6589744
conf_matrix <- confusionMatrix(factor(binary_predictions), factor(y_test))
conf_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 130  74
##          1  59 127
##                                           
##                Accuracy : 0.659           
##                  95% CI : (0.6096, 0.7059)
##     No Information Rate : 0.5154          
##     P-Value [Acc > NIR] : 6.812e-09       
##                                           
##                   Kappa : 0.3189          
##                                           
##  Mcnemar's Test P-Value : 0.2248          
##                                           
##             Sensitivity : 0.6878          
##             Specificity : 0.6318          
##          Pos Pred Value : 0.6373          
##          Neg Pred Value : 0.6828          
##              Prevalence : 0.4846          
##          Detection Rate : 0.3333          
##    Detection Prevalence : 0.5231          
##       Balanced Accuracy : 0.6598          
##                                           
##        'Positive' Class : 0               
## 
roc_curve <- roc(y_test, group_lasso_predictions)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Group LASSO Model (without metabolomics)")

auc_value <- auc(roc_curve)
cat("Group LASSO AUC on Test Set:", auc_value, "\n")
## Group LASSO AUC on Test Set: 0.7135487

The AUC value of 0.7135487 indicates a good discriminatory ability of the model. An AUC closer to 1 suggests a better-performing model, whereas an AUC closer to 0.5 suggests a model with no discriminative power.

The accuracy dropped from 71.2% to 65.9% when metabolomics data was excluded, indicating that metabolomics data provided valuable information for prediction. The AUC also dropped from 0.7796 to 0.7135, suggesting that the model’s ability to distinguish between the classes was better when metabolomics data was included.

Excluding metabolomics data from the Group LASSO model resulted in a decrease in both accuracy and AUC, highlighting the importance of metabolomics data in predicting the binary outcome. While the model without metabolomics still performed reasonably well, the inclusion of metabolomics data improved the model’s performance and discriminative ability. This underscores the value of high-dimensional data like metabolomics in enhancing predictive models in health-related studies.

Binary Outcome?

To look into the sensitivity/specificity using the median of BMI Z-scores. Looking at this dichotomously can perhaps provide a perspective in how the data shows where the observed may be predicted.

# convert hs_zbmi_who to binary based on median
median_value <- median(selected_metabolomics_data$hs_zbmi_who, na.rm = TRUE)
selected_metabolomics_data$hs_zbmi_who_binary <- ifelse(selected_metabolomics_data$hs_zbmi_who > median_value, 1, 0)

set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who_binary, p = .7, 
                                  list = FALSE, 
                                  times = 1)
train_data <- selected_metabolomics_data[trainIndex,]
test_data  <- selected_metabolomics_data[-trainIndex,]

x_train <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, train_data)[,-1]
y_train <- train_data$hs_zbmi_who_binary
x_test <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, test_data)[,-1]
y_test <- test_data$hs_zbmi_who_binary

#to freeze the covariates and make sure they are not shrinked
penalty_factors <- rep(1, ncol(x_train))
penalty_factors[colnames(x_train) %in% covariates_selected] <- 0

# fit LASSO model using cross-validation
lasso_model <- cv.glmnet(x_train, y_train, alpha = 1, family = "binomial", penalty.factor = penalty_factors)
plot(lasso_model)

best_lambda <- lasso_model$lambda.min
cat("Best Lambda:", best_lambda, "\n")
## Best Lambda: 0.01062675
# Get coefficients at best lambda
coef(lasso_model, s = best_lambda)
## 236 x 1 sparse Matrix of class "dgCMatrix"
##                                         s1
## (Intercept)                    9.817829848
## hs_child_age_None             -0.251653165
## h_cohort2                      .          
## h_cohort3                      0.044695737
## h_cohort4                      .          
## h_cohort5                      0.028175680
## h_cohort6                      .          
## e3_sex_Nonemale                0.192655978
## e3_yearbir_None2004           -0.317216991
## e3_yearbir_None2005            .          
## e3_yearbir_None2006            0.129471025
## e3_yearbir_None2007            .          
## e3_yearbir_None2008            .          
## e3_yearbir_None2009            0.736310968
## hs_cd_c_Log2                   .          
## hs_co_c_Log2                   .          
## hs_cs_c_Log2                   0.105137708
## hs_cu_c_Log2                   .          
## hs_hg_c_Log2                   .          
## hs_mo_c_Log2                  -0.041748252
## hs_pb_c_Log2                   .          
## hs_dde_cadj_Log2              -0.004843192
## hs_pcb153_cadj_Log2           -0.268635299
## hs_pcb170_cadj_Log2           -0.012523063
## hs_dep_cadj_Log2              -0.022730853
## hs_pbde153_cadj_Log2          -0.047039163
## hs_pfhxs_c_Log2                .          
## hs_pfoa_c_Log2                -0.249474279
## hs_pfos_c_Log2                 .          
## hs_prpa_cadj_Log2              .          
## hs_mbzp_cadj_Log2              0.023418147
## hs_mibp_cadj_Log2             -0.003071779
## hs_mnbp_cadj_Log2             -0.016537507
## h_bfdur_Ter(10.8,34.9]         0.144044878
## h_bfdur_Ter(34.9,Inf]          .          
## hs_bakery_prod_Ter(2,6]        .          
## hs_bakery_prod_Ter(6,Inf]     -0.329179099
## hs_dairy_Ter(14.6,25.6]        .          
## hs_dairy_Ter(25.6,Inf]         .          
## hs_fastfood_Ter(0.132,0.5]     .          
## hs_fastfood_Ter(0.5,Inf]       .          
## hs_org_food_Ter(0.132,1]       0.039012159
## hs_org_food_Ter(1,Inf]         0.039368797
## hs_readymade_Ter(0.132,0.5]    .          
## hs_readymade_Ter(0.5,Inf]      .          
## hs_total_bread_Ter(7,17.5]    -0.035205982
## hs_total_bread_Ter(17.5,Inf]   .          
## hs_total_fish_Ter(1.5,3]       .          
## hs_total_fish_Ter(3,Inf]       .          
## hs_total_fruits_Ter(7,14.1]    .          
## hs_total_fruits_Ter(14.1,Inf]  .          
## hs_total_lipids_Ter(3,7]       .          
## hs_total_lipids_Ter(7,Inf]     .          
## hs_total_potatoes_Ter(3,4]     .          
## hs_total_potatoes_Ter(4,Inf]  -0.120164768
## hs_total_sweets_Ter(4.1,8.5]   .          
## hs_total_sweets_Ter(8.5,Inf]   .          
## hs_total_veg_Ter(6,8.5]        0.094335123
## hs_total_veg_Ter(8.5,Inf]     -0.069207480
## metab_1                        .          
## metab_2                        .          
## metab_3                        .          
## metab_4                        0.144314385
## metab_5                        0.828767149
## metab_6                        .          
## metab_7                        .          
## metab_8                        0.261196684
## metab_9                       -0.033055477
## metab_10                       .          
## metab_11                       .          
## metab_12                       .          
## metab_13                       .          
## metab_14                       .          
## metab_15                       .          
## metab_16                       .          
## metab_17                       .          
## metab_18                       .          
## metab_19                       .          
## metab_20                       .          
## metab_21                       .          
## metab_22                       .          
## metab_23                       0.154116670
## metab_24                       0.097326692
## metab_25                       .          
## metab_26                      -0.401875623
## metab_27                       .          
## metab_28                       0.219970733
## metab_29                       .          
## metab_30                       0.041596958
## metab_31                       .          
## metab_32                       .          
## metab_33                       .          
## metab_34                       .          
## metab_35                       .          
## metab_36                       0.472184237
## metab_37                      -0.063497362
## metab_38                      -0.286007253
## metab_39                       .          
## metab_40                       .          
## metab_41                       .          
## metab_42                       .          
## metab_43                       .          
## metab_44                       .          
## metab_45                       .          
## metab_46                      -0.097725775
## metab_47                       0.471796517
## metab_48                      -0.940319550
## metab_49                       0.754075021
## metab_50                      -0.219406985
## metab_51                       .          
## metab_52                       .          
## metab_53                       .          
## metab_54                       .          
## metab_55                       .          
## metab_56                       .          
## metab_57                       .          
## metab_58                       .          
## metab_59                       0.303553431
## metab_60                       .          
## metab_61                       .          
## metab_62                       .          
## metab_63                       .          
## metab_64                       .          
## metab_65                       0.029889154
## metab_66                       .          
## metab_67                       .          
## metab_68                       .          
## metab_69                       .          
## metab_70                       .          
## metab_71                      -0.376435940
## metab_72                       .          
## metab_73                       .          
## metab_74                       .          
## metab_75                       .          
## metab_76                       .          
## metab_77                       .          
## metab_78                       .          
## metab_79                       .          
## metab_80                       .          
## metab_81                       0.471287971
## metab_82                      -1.272264014
## metab_83                       .          
## metab_84                       .          
## metab_85                       .          
## metab_86                       .          
## metab_87                       .          
## metab_88                       .          
## metab_89                       .          
## metab_90                       .          
## metab_91                       .          
## metab_92                       .          
## metab_93                       .          
## metab_94                       .          
## metab_95                       2.014499436
## metab_96                       .          
## metab_97                       .          
## metab_98                       .          
## metab_99                       .          
## metab_100                      .          
## metab_101                      .          
## metab_102                      .          
## metab_103                      .          
## metab_104                      0.286489996
## metab_105                      .          
## metab_106                      .          
## metab_107                      .          
## metab_108                      .          
## metab_109                      .          
## metab_110                      .          
## metab_111                      .          
## metab_112                      .          
## metab_113                      0.460764717
## metab_114                      .          
## metab_115                      0.347194110
## metab_116                      .          
## metab_117                      .          
## metab_118                     -1.037174755
## metab_119                      .          
## metab_120                      .          
## metab_121                      .          
## metab_122                     -0.788296600
## metab_123                      .          
## metab_124                      .          
## metab_125                      .          
## metab_126                      .          
## metab_127                     -0.140179156
## metab_128                      .          
## metab_129                      .          
## metab_130                      .          
## metab_131                      .          
## metab_132                      .          
## metab_133                     -0.584432596
## metab_134                      .          
## metab_135                      .          
## metab_136                      .          
## metab_137                      .          
## metab_138                      .          
## metab_139                      .          
## metab_140                      .          
## metab_141                      .          
## metab_142                     -0.230347741
## metab_143                      .          
## metab_144                      .          
## metab_145                     -0.551440685
## metab_146                     -0.016224421
## metab_147                      .          
## metab_148                      .          
## metab_149                      .          
## metab_150                      .          
## metab_151                      0.059077406
## metab_152                     -0.074721859
## metab_153                      .          
## metab_154                      .          
## metab_155                      .          
## metab_156                      .          
## metab_157                      .          
## metab_158                      .          
## metab_159                      .          
## metab_160                     -2.968895206
## metab_161                      4.140540623
## metab_162                      .          
## metab_163                      0.070762874
## metab_164                      .          
## metab_165                      .          
## metab_166                      .          
## metab_167                      .          
## metab_168                      .          
## metab_169                      .          
## metab_170                     -0.115476857
## metab_171                      .          
## metab_172                     -0.395680467
## metab_173                      0.753967982
## metab_174                      .          
## metab_175                      .          
## metab_176                      .          
## metab_177                      0.001169494
lasso_predictions <- predict(lasso_model, s = best_lambda, newx = x_test, type = "response")

# convert probabilities to binary predictions
binary_predictions <- ifelse(lasso_predictions > 0.5, 1, 0)

# make sure levels match between binary_predictions and y_test
binary_predictions <- factor(binary_predictions, levels = c(0, 1))
y_test <- factor(y_test, levels = c(0, 1))

# evaluate accuracy
accuracy <- mean(binary_predictions == y_test)
cat("LASSO Accuracy on Test Set:", accuracy, "\n")
## LASSO Accuracy on Test Set: 0.7486034
conf_matrix <- confusionMatrix(binary_predictions, y_test)
conf_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 134  45
##          1  45 134
##                                           
##                Accuracy : 0.7486          
##                  95% CI : (0.7003, 0.7927)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.4972          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.7486          
##             Specificity : 0.7486          
##          Pos Pred Value : 0.7486          
##          Neg Pred Value : 0.7486          
##              Prevalence : 0.5000          
##          Detection Rate : 0.3743          
##    Detection Prevalence : 0.5000          
##       Balanced Accuracy : 0.7486          
##                                           
##        'Positive' Class : 0               
## 
# ROC Curve and AUC
roc_curve <- roc(as.numeric(y_test), as.numeric(lasso_predictions))
## Setting levels: control = 1, case = 2
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for LASSO Model")

auc_value <- auc(roc_curve)
cat("LASSO AUC on Test Set:", auc_value, "\n")
## LASSO AUC on Test Set: 0.8109922
# fit ridge model using cross-validation
ridge_model <- cv.glmnet(x_train, y_train, alpha = 0, family = "binomial", penalty.factor = penalty_factors)
plot(ridge_model)

best_lambda <- ridge_model$lambda.min
cat("Best Lambda:", best_lambda, "\n")
## Best Lambda: 0.08324346
coef(ridge_model, s = best_lambda)
## 236 x 1 sparse Matrix of class "dgCMatrix"
##                                         s1
## (Intercept)                   -1.809225168
## hs_child_age_None             -0.208177751
## h_cohort2                     -0.107089432
## h_cohort3                      0.197974549
## h_cohort4                      0.209756733
## h_cohort5                      0.135408870
## h_cohort6                      0.186726177
## e3_sex_Nonemale                0.137798221
## e3_yearbir_None2004           -0.246506659
## e3_yearbir_None2005           -0.052896853
## e3_yearbir_None2006            0.068810191
## e3_yearbir_None2007            0.062783282
## e3_yearbir_None2008            0.012632403
## e3_yearbir_None2009            0.927421970
## hs_cd_c_Log2                  -0.021122463
## hs_co_c_Log2                   0.088088328
## hs_cs_c_Log2                   0.146404152
## hs_cu_c_Log2                   0.138273081
## hs_hg_c_Log2                  -0.015516599
## hs_mo_c_Log2                  -0.078642644
## hs_pb_c_Log2                  -0.034045476
## hs_dde_cadj_Log2              -0.061770761
## hs_pcb153_cadj_Log2           -0.220225673
## hs_pcb170_cadj_Log2           -0.035529685
## hs_dep_cadj_Log2              -0.027618181
## hs_pbde153_cadj_Log2          -0.040730399
## hs_pfhxs_c_Log2                0.028112158
## hs_pfoa_c_Log2                -0.252610886
## hs_pfos_c_Log2                -0.002786721
## hs_prpa_cadj_Log2             -0.000525372
## hs_mbzp_cadj_Log2              0.052422989
## hs_mibp_cadj_Log2             -0.020707407
## hs_mnbp_cadj_Log2             -0.071237867
## h_bfdur_Ter(10.8,34.9]         0.207846745
## h_bfdur_Ter(34.9,Inf]          0.100098228
## hs_bakery_prod_Ter(2,6]       -0.012663208
## hs_bakery_prod_Ter(6,Inf]     -0.283052759
## hs_dairy_Ter(14.6,25.6]       -0.008102662
## hs_dairy_Ter(25.6,Inf]        -0.019822211
## hs_fastfood_Ter(0.132,0.5]    -0.086603637
## hs_fastfood_Ter(0.5,Inf]      -0.069775565
## hs_org_food_Ter(0.132,1]       0.140824426
## hs_org_food_Ter(1,Inf]         0.184880984
## hs_readymade_Ter(0.132,0.5]    0.137639210
## hs_readymade_Ter(0.5,Inf]      0.079390909
## hs_total_bread_Ter(7,17.5]    -0.143167666
## hs_total_bread_Ter(17.5,Inf]  -0.060129945
## hs_total_fish_Ter(1.5,3]       0.008124037
## hs_total_fish_Ter(3,Inf]      -0.039199624
## hs_total_fruits_Ter(7,14.1]    0.011059895
## hs_total_fruits_Ter(14.1,Inf]  0.097543052
## hs_total_lipids_Ter(3,7]       0.027161670
## hs_total_lipids_Ter(7,Inf]    -0.040417211
## hs_total_potatoes_Ter(3,4]     0.032975962
## hs_total_potatoes_Ter(4,Inf]  -0.111575155
## hs_total_sweets_Ter(4.1,8.5]   0.055377893
## hs_total_sweets_Ter(8.5,Inf]   0.056718516
## hs_total_veg_Ter(6,8.5]        0.140673301
## hs_total_veg_Ter(8.5,Inf]     -0.140493878
## metab_1                        0.007538623
## metab_2                        0.281224451
## metab_3                        0.074348226
## metab_4                        0.088929100
## metab_5                        0.618308156
## metab_6                       -0.153094359
## metab_7                        0.059456801
## metab_8                        0.439621088
## metab_9                       -0.134996656
## metab_10                       0.091855257
## metab_11                      -0.162356200
## metab_12                      -0.157001523
## metab_13                      -0.066769586
## metab_14                      -0.129796305
## metab_15                       0.024366917
## metab_16                       0.090563684
## metab_17                      -0.125620365
## metab_18                      -0.004839800
## metab_19                       0.005515473
## metab_20                      -0.090937000
## metab_21                       0.264474089
## metab_22                      -0.205930568
## metab_23                       0.250566911
## metab_24                       0.573793312
## metab_25                       0.038657923
## metab_26                      -0.323211888
## metab_27                       0.136376885
## metab_28                       0.357397513
## metab_29                      -0.084064498
## metab_30                       0.197773052
## metab_31                       0.007393019
## metab_32                      -0.110777713
## metab_33                      -0.061378462
## metab_34                       0.067539283
## metab_35                      -0.072199801
## metab_36                       0.416566431
## metab_37                      -0.168488839
## metab_38                      -0.267978332
## metab_39                      -0.137272482
## metab_40                       0.405065140
## metab_41                       0.141570500
## metab_42                      -0.183264700
## metab_43                      -0.239993398
## metab_44                      -0.086654969
## metab_45                       0.031265898
## metab_46                      -0.248364871
## metab_47                       0.505659820
## metab_48                      -0.745798109
## metab_49                       0.494471833
## metab_50                      -0.313486840
## metab_51                       0.294586642
## metab_52                       0.373071654
## metab_53                       0.129818278
## metab_54                       0.277918313
## metab_55                       0.100162468
## metab_56                      -0.104660299
## metab_57                       0.201307996
## metab_58                      -0.044052411
## metab_59                       0.289651023
## metab_60                      -0.128506971
## metab_61                       0.219615424
## metab_62                      -0.059384012
## metab_63                      -0.058620855
## metab_64                       0.156126047
## metab_65                       0.106344109
## metab_66                      -0.062778058
## metab_67                      -0.100105501
## metab_68                       0.051912593
## metab_69                       0.038681009
## metab_70                       0.073972041
## metab_71                      -0.385996152
## metab_72                      -0.043217695
## metab_73                      -0.192567068
## metab_74                      -0.040132610
## metab_75                       0.323485351
## metab_76                      -0.153685664
## metab_77                       0.003039420
## metab_78                      -0.220290749
## metab_79                       0.019424777
## metab_80                       0.136788190
## metab_81                       0.451581347
## metab_82                      -0.449812894
## metab_83                      -0.250707995
## metab_84                      -0.153544955
## metab_85                       0.068411054
## metab_86                       0.050731830
## metab_87                       0.182707559
## metab_88                       0.313617041
## metab_89                      -0.139571982
## metab_90                      -0.187490759
## metab_91                       0.081716071
## metab_92                       0.133209403
## metab_93                      -0.184978136
## metab_94                      -0.057730301
## metab_95                       0.807059317
## metab_96                       0.297032578
## metab_97                      -0.013424773
## metab_98                      -0.067322885
## metab_99                      -0.173421751
## metab_100                      0.248299787
## metab_101                      0.011163645
## metab_102                      0.228913699
## metab_103                      0.190131211
## metab_104                      0.294612243
## metab_105                      0.116106247
## metab_106                      0.031018889
## metab_107                      0.169784257
## metab_108                      0.250156056
## metab_109                     -0.134976113
## metab_110                     -0.159635582
## metab_111                     -0.252264697
## metab_112                      0.052258078
## metab_113                      0.487159973
## metab_114                      0.318732997
## metab_115                      0.384010479
## metab_116                     -0.241586008
## metab_117                     -0.241946201
## metab_118                     -0.395524712
## metab_119                     -0.154178293
## metab_120                     -0.291672788
## metab_121                     -0.192266638
## metab_122                     -0.345381156
## metab_123                     -0.274400476
## metab_124                     -0.149888606
## metab_125                     -0.069621938
## metab_126                     -0.050541059
## metab_127                     -0.112417212
## metab_128                     -0.205789532
## metab_129                      0.057532031
## metab_130                     -0.424290003
## metab_131                     -0.108779395
## metab_132                      0.016765656
## metab_133                     -0.482504421
## metab_134                      0.196762332
## metab_135                      0.024258600
## metab_136                     -0.277529924
## metab_137                      0.051303964
## metab_138                     -0.029370385
## metab_139                      0.205425565
## metab_140                     -0.174001305
## metab_141                     -0.185009698
## metab_142                     -0.349682499
## metab_143                     -0.211671773
## metab_144                      0.278987607
## metab_145                     -0.471221663
## metab_146                     -0.233276903
## metab_147                      0.154733005
## metab_148                      0.152238437
## metab_149                     -0.166547580
## metab_150                      0.125948755
## metab_151                      0.123639632
## metab_152                     -0.067496019
## metab_153                     -0.178180029
## metab_154                     -0.001515247
## metab_155                     -0.132020110
## metab_156                     -0.140257823
## metab_157                      0.088574692
## metab_158                      0.144857975
## metab_159                      0.110695626
## metab_160                     -0.582409538
## metab_161                      1.352636126
## metab_162                      0.060967806
## metab_163                      0.668133017
## metab_164                      0.106721097
## metab_165                     -0.040820127
## metab_166                     -0.269071657
## metab_167                     -0.005412347
## metab_168                     -0.135225474
## metab_169                     -0.098770912
## metab_170                     -0.129765986
## metab_171                     -0.013013038
## metab_172                     -0.290478830
## metab_173                      0.542461994
## metab_174                     -0.281669521
## metab_175                      0.062822150
## metab_176                      0.198782592
## metab_177                      0.164851092
ridge_predictions <- predict(ridge_model, s = best_lambda, newx = x_test, type = "response")

# convert probabilities to binary predictions
binary_predictions <- ifelse(ridge_predictions > 0.5, 1, 0)

# make sure levels match between binary_predictions and y_test
binary_predictions <- factor(binary_predictions, levels = c(0, 1))
y_test <- factor(y_test, levels = c(0, 1))

# accuracy accuracy
accuracy <- mean(binary_predictions == y_test)
cat("Ridge Accuracy on Test Set:", accuracy, "\n")
## Ridge Accuracy on Test Set: 0.7150838
conf_matrix <- confusionMatrix(binary_predictions, y_test)
conf_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 124  47
##          1  55 132
##                                           
##                Accuracy : 0.7151          
##                  95% CI : (0.6653, 0.7613)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.4302          
##                                           
##  Mcnemar's Test P-Value : 0.4882          
##                                           
##             Sensitivity : 0.6927          
##             Specificity : 0.7374          
##          Pos Pred Value : 0.7251          
##          Neg Pred Value : 0.7059          
##              Prevalence : 0.5000          
##          Detection Rate : 0.3464          
##    Detection Prevalence : 0.4777          
##       Balanced Accuracy : 0.7151          
##                                           
##        'Positive' Class : 0               
## 
# ROC Curve and AUC
roc_curve <- roc(as.numeric(y_test), as.numeric(ridge_predictions))
## Setting levels: control = 1, case = 2
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Ridge Model")

auc_value <- auc(roc_curve)
cat("Ridge AUC on Test Set:", auc_value, "\n")
## Ridge AUC on Test Set: 0.8064667
# fit enet model using cross-validation
enet_model <- cv.glmnet(x_train, y_train, alpha = 0.5, family = "binomial", penalty.factor = penalty_factors)
plot(enet_model)

best_lambda <- enet_model$lambda.min
cat("Best Lambda:", best_lambda, "\n")
## Best Lambda: 0.01764503
coef(enet_model, s = best_lambda)
## 236 x 1 sparse Matrix of class "dgCMatrix"
##                                          s1
## (Intercept)                    5.2519539319
## hs_child_age_None             -0.2568910841
## h_cohort2                      .           
## h_cohort3                      0.1132396908
## h_cohort4                      .           
## h_cohort5                      0.0835385406
## h_cohort6                      .           
## e3_sex_Nonemale                0.1763631440
## e3_yearbir_None2004           -0.2869898855
## e3_yearbir_None2005            .           
## e3_yearbir_None2006            0.1052923898
## e3_yearbir_None2007            .           
## e3_yearbir_None2008            .           
## e3_yearbir_None2009            0.7816007150
## hs_cd_c_Log2                   .           
## hs_co_c_Log2                   .           
## hs_cs_c_Log2                   0.1226618816
## hs_cu_c_Log2                   .           
## hs_hg_c_Log2                   .           
## hs_mo_c_Log2                  -0.0515756931
## hs_pb_c_Log2                   .           
## hs_dde_cadj_Log2              -0.0183668149
## hs_pcb153_cadj_Log2           -0.2688132682
## hs_pcb170_cadj_Log2           -0.0196421068
## hs_dep_cadj_Log2              -0.0236732706
## hs_pbde153_cadj_Log2          -0.0466027441
## hs_pfhxs_c_Log2                .           
## hs_pfoa_c_Log2                -0.2622711860
## hs_pfos_c_Log2                 .           
## hs_prpa_cadj_Log2              .           
## hs_mbzp_cadj_Log2              0.0354129754
## hs_mibp_cadj_Log2              .           
## hs_mnbp_cadj_Log2             -0.0369884956
## h_bfdur_Ter(10.8,34.9]         0.1547177620
## h_bfdur_Ter(34.9,Inf]          .           
## hs_bakery_prod_Ter(2,6]        .           
## hs_bakery_prod_Ter(6,Inf]     -0.3301919563
## hs_dairy_Ter(14.6,25.6]        .           
## hs_dairy_Ter(25.6,Inf]         .           
## hs_fastfood_Ter(0.132,0.5]     .           
## hs_fastfood_Ter(0.5,Inf]       .           
## hs_org_food_Ter(0.132,1]       0.0878872611
## hs_org_food_Ter(1,Inf]         0.1014419628
## hs_readymade_Ter(0.132,0.5]    .           
## hs_readymade_Ter(0.5,Inf]      .           
## hs_total_bread_Ter(7,17.5]    -0.0639619030
## hs_total_bread_Ter(17.5,Inf]   .           
## hs_total_fish_Ter(1.5,3]       .           
## hs_total_fish_Ter(3,Inf]       .           
## hs_total_fruits_Ter(7,14.1]    .           
## hs_total_fruits_Ter(14.1,Inf]  0.0020140570
## hs_total_lipids_Ter(3,7]       .           
## hs_total_lipids_Ter(7,Inf]     .           
## hs_total_potatoes_Ter(3,4]     .           
## hs_total_potatoes_Ter(4,Inf]  -0.1200486506
## hs_total_sweets_Ter(4.1,8.5]   .           
## hs_total_sweets_Ter(8.5,Inf]   .           
## hs_total_veg_Ter(6,8.5]        0.1192182564
## hs_total_veg_Ter(8.5,Inf]     -0.0908939754
## metab_1                        .           
## metab_2                        .           
## metab_3                        .           
## metab_4                        0.1139744587
## metab_5                        0.8141634476
## metab_6                        .           
## metab_7                        .           
## metab_8                        0.3244413922
## metab_9                       -0.0970712000
## metab_10                       .           
## metab_11                       .           
## metab_12                      -0.0034986374
## metab_13                       .           
## metab_14                       .           
## metab_15                       .           
## metab_16                       .           
## metab_17                       .           
## metab_18                       .           
## metab_19                       .           
## metab_20                       .           
## metab_21                       .           
## metab_22                      -0.0388865423
## metab_23                       0.1807355595
## metab_24                       0.2594875809
## metab_25                       .           
## metab_26                      -0.4817320352
## metab_27                       .           
## metab_28                       0.3391386930
## metab_29                       .           
## metab_30                       0.0548106184
## metab_31                       .           
## metab_32                       .           
## metab_33                       .           
## metab_34                       .           
## metab_35                       .           
## metab_36                       0.4505270358
## metab_37                      -0.1107056885
## metab_38                      -0.3136102371
## metab_39                       .           
## metab_40                       .           
## metab_41                       .           
## metab_42                       .           
## metab_43                       .           
## metab_44                       .           
## metab_45                       .           
## metab_46                      -0.1411560452
## metab_47                       0.5132799985
## metab_48                      -0.9203234513
## metab_49                       0.7296022958
## metab_50                      -0.2579007191
## metab_51                       .           
## metab_52                       .           
## metab_53                       .           
## metab_54                       0.0532388725
## metab_55                       .           
## metab_56                       .           
## metab_57                       .           
## metab_58                       .           
## metab_59                       0.3253444973
## metab_60                       .           
## metab_61                       0.0008841859
## metab_62                       .           
## metab_63                       .           
## metab_64                       .           
## metab_65                       0.0799770070
## metab_66                       .           
## metab_67                       .           
## metab_68                       .           
## metab_69                       .           
## metab_70                       .           
## metab_71                      -0.4373499008
## metab_72                       .           
## metab_73                       .           
## metab_74                       .           
## metab_75                       .           
## metab_76                       .           
## metab_77                       .           
## metab_78                       .           
## metab_79                       .           
## metab_80                       .           
## metab_81                       0.5196293799
## metab_82                      -0.9998581194
## metab_83                       .           
## metab_84                       .           
## metab_85                       .           
## metab_86                       .           
## metab_87                       .           
## metab_88                       .           
## metab_89                       .           
## metab_90                       .           
## metab_91                       .           
## metab_92                       0.0462111112
## metab_93                       .           
## metab_94                       .           
## metab_95                       1.8159679172
## metab_96                       .           
## metab_97                       .           
## metab_98                       .           
## metab_99                       .           
## metab_100                      .           
## metab_101                      .           
## metab_102                      .           
## metab_103                      .           
## metab_104                      0.3220121363
## metab_105                      .           
## metab_106                      .           
## metab_107                      .           
## metab_108                      .           
## metab_109                      .           
## metab_110                      .           
## metab_111                      .           
## metab_112                      .           
## metab_113                      0.5565705069
## metab_114                      .           
## metab_115                      0.3747031426
## metab_116                     -0.2203697857
## metab_117                      .           
## metab_118                     -0.7791808620
## metab_119                      .           
## metab_120                     -0.1261682210
## metab_121                      .           
## metab_122                     -0.6614627830
## metab_123                     -0.2286655474
## metab_124                      .           
## metab_125                      .           
## metab_126                      .           
## metab_127                     -0.1246327049
## metab_128                     -0.0473762605
## metab_129                      .           
## metab_130                     -0.1268721751
## metab_131                      .           
## metab_132                      .           
## metab_133                     -0.6280866974
## metab_134                      .           
## metab_135                      .           
## metab_136                      .           
## metab_137                      .           
## metab_138                      .           
## metab_139                      .           
## metab_140                      .           
## metab_141                      .           
## metab_142                     -0.2770473374
## metab_143                      .           
## metab_144                      0.0023998660
## metab_145                     -0.5912503914
## metab_146                      .           
## metab_147                      .           
## metab_148                      .           
## metab_149                      .           
## metab_150                      0.0253786667
## metab_151                      0.0869995911
## metab_152                     -0.0779875717
## metab_153                      .           
## metab_154                      .           
## metab_155                      .           
## metab_156                      .           
## metab_157                      .           
## metab_158                      .           
## metab_159                      .           
## metab_160                     -1.8982894818
## metab_161                      2.9769001011
## metab_162                      .           
## metab_163                      0.4055194288
## metab_164                      .           
## metab_165                      .           
## metab_166                     -0.1661472739
## metab_167                      .           
## metab_168                      .           
## metab_169                      .           
## metab_170                     -0.1111756978
## metab_171                      .           
## metab_172                     -0.3958754065
## metab_173                      0.7484104382
## metab_174                      .           
## metab_175                      .           
## metab_176                      .           
## metab_177                      0.1113704413
enet_predictions <- predict(enet_model, s = best_lambda, newx = x_test, type = "response")

# convert probabilities to binary predictions
binary_predictions <- ifelse(enet_predictions > 0.5, 1, 0)

# make sure levels match between binary_predictions and y_test
binary_predictions <- factor(binary_predictions, levels = c(0, 1))
y_test <- factor(y_test, levels = c(0, 1))

# accuracy accuracy
accuracy <- mean(binary_predictions == y_test)
cat("Ridge Accuracy on Test Set:", accuracy, "\n")
## Ridge Accuracy on Test Set: 0.7430168
conf_matrix <- confusionMatrix(binary_predictions, y_test)
conf_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 132  45
##          1  47 134
##                                           
##                Accuracy : 0.743           
##                  95% CI : (0.6945, 0.7875)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.486           
##                                           
##  Mcnemar's Test P-Value : 0.917           
##                                           
##             Sensitivity : 0.7374          
##             Specificity : 0.7486          
##          Pos Pred Value : 0.7458          
##          Neg Pred Value : 0.7403          
##              Prevalence : 0.5000          
##          Detection Rate : 0.3687          
##    Detection Prevalence : 0.4944          
##       Balanced Accuracy : 0.7430          
##                                           
##        'Positive' Class : 0               
## 
# ROC Curve and AUC
roc_curve <- roc(as.numeric(y_test), as.numeric(enet_predictions))
## Setting levels: control = 1, case = 2
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Elastic Net Model")

auc_value <- auc(roc_curve)
cat("Elastic Net AUC on Test Set:", auc_value, "\n")
## Elastic Net AUC on Test Set: 0.8092756
selected_metabolomics_data <- selected_metabolomics_data %>% na.omit()

# hs_zbmi_who to binary based on median
median_value <- median(selected_metabolomics_data$hs_zbmi_who, na.rm = TRUE)
selected_metabolomics_data$hs_zbmi_who_binary <- ifelse(selected_metabolomics_data$hs_zbmi_who > median_value, 1, 0)
selected_metabolomics_data$hs_zbmi_who_binary <- factor(selected_metabolomics_data$hs_zbmi_who_binary, levels = c(0, 1), labels = c("0", "1"))

set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who_binary, p = .7, 
                                  list = FALSE, 
                                  times = 1)
train_data <- selected_metabolomics_data[trainIndex,]
test_data  <- selected_metabolomics_data[-trainIndex,]

x_train <- model.matrix(hs_zbmi_who_binary ~ . , train_data)[,-1]
y_train <- train_data$hs_zbmi_who_binary
x_test <- model.matrix(hs_zbmi_who_binary ~ . , test_data)[,-1]
y_test <- test_data$hs_zbmi_who_binary

set.seed(101)
rf_model <- randomForest(hs_zbmi_who_binary ~ . -hs_zbmi_who, data = train_data, ntree = 500)

rf_predictions_prob <- predict(rf_model, newdata = test_data, type = "prob")[,2]
rf_predictions <- predict(rf_model, newdata = test_data)

rf_mse <- mean((as.numeric(as.character(rf_predictions)) - as.numeric(as.character(y_test)))^2)
cat("Random Forest Mean Squared Error on Test Set:", rf_mse, "\n")
## Random Forest Mean Squared Error on Test Set: 0.3044693
importance(rf_model)
##                       MeanDecreaseGini
## hs_child_age_None            1.7882343
## h_cohort                     3.4070742
## e3_sex_None                  0.0976352
## e3_yearbir_None              1.5901350
## hs_cd_c_Log2                 2.0336364
## hs_co_c_Log2                 1.7684074
## hs_cs_c_Log2                 2.1538054
## hs_cu_c_Log2                 2.1707957
## hs_hg_c_Log2                 1.9356511
## hs_mo_c_Log2                 2.2003423
## hs_pb_c_Log2                 1.9728173
## hs_dde_cadj_Log2             3.5904604
## hs_pcb153_cadj_Log2          5.7437384
## hs_pcb170_cadj_Log2          8.9557364
## hs_dep_cadj_Log2             2.2198722
## hs_pbde153_cadj_Log2         5.2959632
## hs_pfhxs_c_Log2              2.4815853
## hs_pfoa_c_Log2               4.5279272
## hs_pfos_c_Log2               2.6316318
## hs_prpa_cadj_Log2            1.7365372
## hs_mbzp_cadj_Log2            2.1180311
## hs_mibp_cadj_Log2            1.7210491
## hs_mnbp_cadj_Log2            1.7487483
## h_bfdur_Ter                  0.4371279
## hs_bakery_prod_Ter           0.8757978
## hs_dairy_Ter                 0.2981300
## hs_fastfood_Ter              0.2984583
## hs_org_food_Ter              0.2857926
## hs_readymade_Ter             0.2855018
## hs_total_bread_Ter           0.2939936
## hs_total_fish_Ter            0.3398121
## hs_total_fruits_Ter          0.2781647
## hs_total_lipids_Ter          0.2981745
## hs_total_potatoes_Ter        0.4314328
## hs_total_sweets_Ter          0.3220028
## hs_total_veg_Ter             0.6506433
## metab_1                      1.8748120
## metab_2                      2.3992827
## metab_3                      2.2773235
## metab_4                      3.9668801
## metab_5                      2.3578607
## metab_6                      2.0442188
## metab_7                      2.1914771
## metab_8                      5.3874192
## metab_9                      1.7440494
## metab_10                     1.8119869
## metab_11                     1.5557174
## metab_12                     1.6841044
## metab_13                     1.2247874
## metab_14                     1.4551697
## metab_15                     1.7061037
## metab_16                     1.4974011
## metab_17                     0.8980096
## metab_18                     1.2762252
## metab_19                     1.2240781
## metab_20                     1.6235781
## metab_21                     1.4388117
## metab_22                     1.0604520
## metab_23                     1.1651123
## metab_24                     1.3404663
## metab_25                     1.5245764
## metab_26                     1.6856714
## metab_27                     2.1028728
## metab_28                     2.8648042
## metab_29                     1.8565476
## metab_30                     3.4944202
## metab_31                     1.6097240
## metab_32                     1.4494120
## metab_33                     1.6711737
## metab_34                     1.0676798
## metab_35                     1.5825802
## metab_36                     1.5879570
## metab_37                     1.1340969
## metab_38                     1.1811165
## metab_39                     1.6070836
## metab_40                     1.6419862
## metab_41                     1.2629394
## metab_42                     1.1285223
## metab_43                     1.7463093
## metab_44                     2.1058742
## metab_45                     2.0415691
## metab_46                     1.7021922
## metab_47                     3.0012894
## metab_48                     2.8036281
## metab_49                     9.0266778
## metab_50                     2.3541918
## metab_51                     2.0795759
## metab_52                     2.1507585
## metab_53                     2.8595225
## metab_54                     2.7272182
## metab_55                     2.4345109
## metab_56                     2.2137258
## metab_57                     1.6384552
## metab_58                     1.6279814
## metab_59                     2.2909219
## metab_60                     1.8986732
## metab_61                     1.8852996
## metab_62                     1.4288996
## metab_63                     1.6858152
## metab_64                     1.7536434
## metab_65                     1.9244427
## metab_66                     1.4721162
## metab_67                     1.6857963
## metab_68                     1.7017285
## metab_69                     1.6427850
## metab_70                     1.6356357
## metab_71                     1.9173119
## metab_72                     2.0229571
## metab_73                     1.7524860
## metab_74                     1.4490926
## metab_75                     1.7655133
## metab_76                     1.6905191
## metab_77                     1.9247036
## metab_78                     1.9534272
## metab_79                     1.8695118
## metab_80                     1.8984313
## metab_81                     1.7828383
## metab_82                     2.0424690
## metab_83                     1.8894423
## metab_84                     1.8533558
## metab_85                     1.8645443
## metab_86                     1.5504325
## metab_87                     1.4357740
## metab_88                     1.4163450
## metab_89                     1.2861551
## metab_90                     1.5796906
## metab_91                     1.7200843
## metab_92                     1.5725267
## metab_93                     1.4787178
## metab_94                     2.1266446
## metab_95                     7.0238869
## metab_96                     4.4088649
## metab_97                     1.8927510
## metab_98                     1.3704423
## metab_99                     1.7864630
## metab_100                    1.4251670
## metab_101                    1.2973353
## metab_102                    3.2840264
## metab_103                    1.7889091
## metab_104                    1.8402342
## metab_105                    1.6699572
## metab_106                    1.5347103
## metab_107                    1.6705529
## metab_108                    1.4950683
## metab_109                    1.4447063
## metab_110                    1.8322554
## metab_111                    2.2671589
## metab_112                    1.5165425
## metab_113                    1.9298389
## metab_114                    1.6078074
## metab_115                    1.6271083
## metab_116                    2.4586914
## metab_117                    2.0121790
## metab_118                    2.3112340
## metab_119                    1.6028247
## metab_120                    2.3347458
## metab_121                    1.7493994
## metab_122                    2.9261486
## metab_123                    1.8338084
## metab_124                    1.8727595
## metab_125                    1.5783007
## metab_126                    1.7118489
## metab_127                    2.6049567
## metab_128                    1.9949492
## metab_129                    1.4050579
## metab_130                    1.5640162
## metab_131                    1.2280257
## metab_132                    1.6077375
## metab_133                    1.6844817
## metab_134                    1.7214976
## metab_135                    1.5103019
## metab_136                    1.4840693
## metab_137                    1.7323210
## metab_138                    1.6517019
## metab_139                    1.2228561
## metab_140                    1.5962479
## metab_141                    2.2717847
## metab_142                    2.4059012
## metab_143                    1.9307291
## metab_144                    1.9357969
## metab_145                    1.9087173
## metab_146                    2.1261243
## metab_147                    1.8271987
## metab_148                    1.6705805
## metab_149                    1.6899110
## metab_150                    1.5871588
## metab_151                    2.0640985
## metab_152                    2.1115571
## metab_153                    1.7320060
## metab_154                    2.1527091
## metab_155                    1.6830787
## metab_156                    1.6323637
## metab_157                    1.8792394
## metab_158                    1.5623017
## metab_159                    1.8392205
## metab_160                    1.9005237
## metab_161                    5.6264133
## metab_162                    1.6604760
## metab_163                    3.6263841
## metab_164                    2.5022988
## metab_165                    1.7772000
## metab_166                    1.4581948
## metab_167                    1.5775242
## metab_168                    1.5930518
## metab_169                    1.9307432
## metab_170                    2.2579057
## metab_171                    2.7163044
## metab_172                    1.8586232
## metab_173                    1.8181690
## metab_174                    1.7265370
## metab_175                    1.9941063
## metab_176                    3.4282201
## metab_177                    3.8501640
varImpPlot(rf_model)

# ROC Curve and AUC
roc_curve <- roc(as.numeric(as.character(y_test)), as.numeric(as.character(rf_predictions_prob)))
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Random Forest Model")

auc_value <- auc(roc_curve)
cat("Random Forest AUC on Test Set:", auc_value, "\n")
## Random Forest AUC on Test Set: 0.7623982
conf_matrix <- confusionMatrix(rf_predictions, y_test)
print(conf_matrix)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 117  47
##          1  62 132
##                                          
##                Accuracy : 0.6955         
##                  95% CI : (0.645, 0.7428)
##     No Information Rate : 0.5            
##     P-Value [Acc > NIR] : 4.947e-14      
##                                          
##                   Kappa : 0.3911         
##                                          
##  Mcnemar's Test P-Value : 0.1799         
##                                          
##             Sensitivity : 0.6536         
##             Specificity : 0.7374         
##          Pos Pred Value : 0.7134         
##          Neg Pred Value : 0.6804         
##              Prevalence : 0.5000         
##          Detection Rate : 0.3268         
##    Detection Prevalence : 0.4581         
##       Balanced Accuracy : 0.6955         
##                                          
##        'Positive' Class : 0              
##